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1. Introduction 



1.1. Purpose of this Guide 

The purpose of this data guide is to provide users of the National Household Education Survey (NHES) 
data with suggested techniques for working with the data files. Special attention is paid to topics that will 
help users avoid the most commonly made mistakes in working with NHES data. Answers to some 
questions that are frequently asked about the NHES are also included in appendix A. This guide is meant 
to serve as an introduction and overview, and not as a replacement for the separate User’s Manuals and 
other reports currently available. 



1.2. Overview of the National Household Education Survey 

The NHES is a data collection system of the National Center for Education Statistics (NCES). It provides 
descriptive data on the educational activities of the U.S. population and offers policymakers, researchers, 
and educators a variety of statistics on the condition of education in the United States. Although the 
primary purpose of the NHES is to conduct repeated measurements of the same phenomena at different 
points in time, one-time surveys on topics of interest to the Department of Education may also be fielded. 

The NHES is a telephone survey of the noninstitutionalized civilian population of the United States. 
Households are selected for the survey using random digit dialing (RDD) methods. Data are collected 
using computer assisted telephone interviewing (CATI) procedures. The methodology for any single 
fielding of the NHES is linked to the research issues under study, the level of data required to address 
these issues, and how precise the estimates generated from the survey data need to be in order to meet 
the objectives of the study. The NHES has been conducted in 1991, 1993, 1995, and 1996. Each of 
these collections has included two larger topical components; the 1996 survey included a small third 
component as well. 



1.2.1. NHES:91 

The topics addressed by the NHES:91 were early childhood education and adult education. About 60,000 
households were screened for the NHES:91. In the Early Childhood Education (ECE) component, 
13,892 parents/guardians of 3- to 8-year-olds completed interviews about their children’s early educational 
experiences. Included in this component were participation in nonparental care/education, characteristics 
of programs and care arrangements, and early school experiences including delayed kindergarten entry 
and retention in grade. In addition to questions about care/education arrangements and school, parents 
were asked about activities children engaged in with parents and other family members inside and outside 
the home. Information on family, household, and child characteristics was also collected. 

In the NHES:91 Adult Education (AE) component, 9,774 persons 16 years of age and older, identified 
as having participated in an adult education activity in the previous 12 months, were questioned about 
their activities. Information collected on up to four courses included the subject matter, duration, 
sponsorship, purpose, and cost. A smaller sample of nonparticipants (n = 2,794) also completed 
interviews about barriers to participation. Information on the household and the adult’s background and 
current employment was also collected. 
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1.2.2. NHES:93 



In the NHES:93, about 64,000 households were screened. Nearly 11,000 parents of children aged 3 
through 7 or in second grade or below completed interviews for the School Readiness (SR) component 
(n = 10,888). Topics included in this component were the developmental characteristics of preschoolers, 
school adjustment and teacher feedback to parents for kindergartners and primary students, center-based 
program participation, early school experiences, home activities with family members, and health status. 
Extensive family and child background characteristics, including parent language and education, income, 
receipt of public assistance, and household composition, were collected to permit the identification of at- 
risk children. 

In the School Safety and Discipline (SS&D) component, 12,680 parents of children in grades 3 through 
12 and 6,504 youth in grades 6 through 12 were interviewed about their school experiences. Topics 
included the school learning environment, discipline policy, safety at school, victimization, availability 
and use of alcohol/drugs, and alcohol/drug education. Peer norms for behavior in school and substance 
use were also included in this topical component. Extensive family and household background 
information and data about characteristics of the school attended by the child were collected. 



1.2.3. NHES:95 

The two survey components of the NHES:95, Early Childhood Program Participation (ECPP) and Adult 
Education (AE), addressed the same topics as the NHES:91 . Over 45,000 households were screened for 
the NHES:95. In the ECPP component, parents of 14,064 children from birth through third grade were 
asked about their children’s participation in care or education provided by relatives, nonrelatives, Head 
Start programs, and center-based programs. The ECPP survey also collected information on early school 
experiences for school-age children, home literacy activities, health and disability status, and parent and 
family characteristics. 

For the AE interview, adults (i.e., persons age 16 and older not currently enrolled in secondary school) 
were asked about their participation in basic skills courses, English as a second language (ESL) courses, 
credential (degree or diploma) programs, apprenticeships, work-related courses, personal 
development/interest courses, and interactive video or computer training on the job. Information on 
programs or courses included the subject matter, duration, cost, location and sponsorship, and employer 
support. Nonparticipants in selected types of adult education were asked about their interest in 
educational activities and barriers to participation. Extensive background, employment, and household 
information was collected for each adult. Altogether, 19,722 adults were interviewed. 



1.2.4 NHES:96 

The NHES:96 addressed the components of Parent and Family Involvement in Education (PFI) and Civic 
Involvement (Cl). Almost 56,000 households were screened for the NHES:96. In the PFI component, 
parents of 20,792 children aged 3 through 20 and in 12th grade or below were asked about their 
children’s schools, communication with teachers or other school personnel, school practices to involve 
parents, their children’s homework and behavior, and learning activities with children outside of school. 
Other information collected in this component pertained to student experiences, children’s personal and 
demographic characteristics, household characteristics, and children’s health and disability statuses. 
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The Cl component focused on sources of information, civic participation, and knowledge and attitudes 
about government of youth in grades 6 through 12 and their parents, as well as a representative sample 
of U.S. adults. The Cl component also provided an assessment of the opportunities that youth have to 
develop the personal responsibility and skills that would facilitate their taking an active role in civic life. 
Interviews were completed with 9,393 parents of students in grades 6 through 12 (including home- 
schooled students in those grades), 8,043 youth in grades 6 through 12, and 2,250 adults. 



1.3. Research Issues that Can Be Addressed with the NHES Data 

Each component collects specific data based on a set of research questions that guided the development 
of the survey component. This section lists the research questions pertinent to each component. A list 
of publications is also provided where appropriate. To obtain single copies of the publications listed 
in this section, call (703) 845-3151. 

As the research questions shown below illustrate, a wide range of research issues can be addressed by 
analysts using the NHES data. However, these lists are not exhaustive. Analysts should review the 
survey instrument for each component to identify additional areas of particular interest to them. 



1.3.1. NHES:91 Early Childhood Education (ECE) 

The following research questions guided the development of the NHES:91 ECE component: 

1 . What developmental and education experiences do children bring to school from their 
homes? 

2. What is the level of participation in early childhood programs, and what are the 
characteristics of this participation? 

3. To what extent are parents delaying their children’s entry into school? 

4. To what extent are parents involved in their children’s early childhood program 
participation and schooling? 

5. To what combinations of educational experiences are children exposed? 

6. What are the paths that children take into primary school? 

7. What are the early school experiences of children? 

8. What are the rates of retention in kindergarten, first grade, and second grade? 

Reports using the ECE data published by NCES include: Home Activities of 3- to 8-year-olds (NCES 
Publication No. 92-004), Experiences in Child Care and Early Childhood Programs of First and Second 
Graders (NCES Publication No. 92-005), Profile of Preschool Children 's Child Care and Early Education 
Program Participation (NCES Publication No. 93-133), and Access to Early Childhood Programs for 
Children at Risk (NCES Publication No. 93-372). 



1.3.2. NHES:91 Adult Education (AE) 

The following research questions guided the development of the NHES:91 AE component: 

1. To what extent are educationally needy adults receiving basic skills and language 
training? 

2. To what extent are adults pursuing employment- and career-related training? Are the 
most needy (those with low educational attainment and fewer job skills, those who have 
been out of the labor force) receiving training? 

3. Who is providing adult education? Who pays for it? What is the involvement of 
business, industry, and other groups in adult education activities? 

4. What is the pattern of adult education for involved adults? What is the content, format, 
and intensity of adult education activities? 

5. Why do adults participate? What advantages do adults perceive from their participation 
in adult education? 

6. What factors present barriers to participation in adult education activities? 

NCES reports based on the NHES:91 AE component include: Adult Education: Main Reasons for 
Participating (NCES Publication No . 93-45 1 ) and Adult Education : Employment-Related Trainin g (NCES 
Publication No. 94-471). 



1.3.3. NHES:93 School Readiness (SR) 

The development of the NHES:93 SR component was guided by the following research questions: 

1 . What is the level of parent/child interaction for young children? 

2. What sources of information and advice on child development and education do parents 
use? 

3. What is the general health/nutritional status of children? 

4. In what early childhood programs do preschool children participate? 

5. What early childhood programs did first and second graders participate in prior to 
entering kindergarten (or prior to first grade if they did not attend kindergarten)? 

6. How many children experience adjustment problems when they enter kindergarten? 

7. How many children experience adjustment problems in primary grades? 

8. How many children are retained in kindergarten, placed in transitional grades, or 
retained in primary grades? 
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Reports that have been prepared using the SR component include the following: Readiness for 

Kindergarten: Parent and Teacher Beliefs (NCES Publication No. 93-257), Approaching Kindergarten: 
A Look at Preschoolers in the United States (NCES Publication No. 95-280), and Family-Child 
Engagement in Literacy Activities: Changes in Participation Between 1991 and 1993 (NCES Publication 
No. 95-689). Each of these reports addresses one or more of the research questions above. 



1.3.4. NHES:93 School Safety and Discipline (SS&D) 

The following research questions were used to guide the development of the NHES:93 SS&D component: 

1 . How do parents and students perceive the learning environment of the school? 

2. How safe do parents and students believe the school environment to be? 

3. How is the school’s discipline policy perceived? 

4. To what extent are tobacco, alcohol, and/or other drug use perceived by parents and 
students as a problem in school? 

5. Do parents and students identify alcohol and other drug education programs at school? 

6. To what extent do family norms and behavior support appropriate behavior in school? 

Several reports have also been prepared using the NHES:93 SS&D component. These reports include 
Student Victimization at School (NCES Publication No. 95-204), Student Strategies to Avoid Harm at 
School (NCES Publication No. 95-203), Gangs and Victimization at School (NCES Publication No. 95- 
740), Use of School Choice (NCES Publication No. 95-742R), and Parent and Student Perceptions of the 
Learning Environment at School (NCES Publication No. 93-281). 



1.3.5. NHES:95 Early Childhood Program Participation (ECPP) 

The development of the NHES:95 ECPP component was guided by the following research questions: 

1 . To what extent do children receive nonparental care and early childhood education? 

2. Do at-risk children have the same access to nonparental care and education programs as 
children as a whole or as advantaged children? 

3. What experiences do children have with nonparental child care or early childhood 
programs prior to enrolling in kindergarten or primary school? 

4. How has the nonparental care or education children receive changed from 1991 to 1995? 

5. To what extent do kindergartners and primary school children receive nonparental care 
by relatives, by nonrelatives, and in center-based or school-based before-and-after school 
programs, in addition to their enrollment in school? 

6. How many primary school children return home from school to sibling care or self-care? 
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7. To what extent do children experience continuity or discontinuity in child care 
arrangements and early childhood programs? 

8. To what extent do children in different types of arrangements have different care or 
education experiences? 

9. How are parent expenditures associated with receipt and quality of nonparental care or 
education? 

10. How are children’s school experiences associated with receipt of nonparental care and 
early childhood programs prior to starting kindergarten? 

1 1 . What is the prevalence of home schooling among children of kindergarten age and in 
kindergarten through third grade? 

12. Which aspects of nonparental care and education are most important to parents? 

To date, one report has been prepared using the ECPP component and published by NCES. This report 
is Child Care and Early Education Program Participation of Infants, Toddlers, and Preschoolers (NCES 
Publication No. 95-824). Additional reports are forthcoming, including one pertaining to children’s 
delayed entry into kindergarten and retention in kindergarten. 



1.3.6. NHES:95 Adult Education (AE) 

The development of the NHES:95 AE component was guided by the following research questions: 

1 . To what extent do adults participate in educational activities, and what changes occurred 
in that participation from 1991 to 1995? 

2. What are the main reasons for participation in adult education, and have these main 
reasons changed from 1991 to 1995? 

3. What are the main reasons adults do not take part in adult education? 

4. Who are the providers of various types of adult education? 

5. How much time do adults spend in educational activities? 

6. What is the involvement of employers and labor unions in the provision of adult 
education? 

7. To what extent do adults use their own resources to participate in educational activities? 

8. In what programs or courses do adults take part? 

9. To what extent do adults participate in career-or job-related activities? 

10. To what extent do adults use what they learn in educational activities? 
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To date, one report has been published by NCES for the AE component, entitled Forty Percent of Adults 
Participate in Adult Education Activities: 1994-95 (NCES Publication No. 95-823). Other reports are 
forthcoming, including those covering the topics of participation in basic skills education, participation 
English as a Second Language (ESL) education, and participation in work-related education activities. 



1.3.7 NHES:96 Parent and Family Involvement in Education (PFI) 

The development of the NHES:96 PFI component was guided by the following research questions: 

1. In what ways are parents or other household members involved with children’s 
schooling? 

2. In what ways are parents or other household members involved with their child’s 
homework and behavior? 

3. In what ways are family members involved in children’s education that are not directly 
related to school or homework? 

4. In what ways do teachers or other school personnel communicate with parents/families? 

5. What types of school practices to involve and support families do parents report? 

6. To what extent are parents of preschoolers receiving the training and support that they 
need? 

To date, one report has been prepared for publication by NCES for the PFI component, entitled Parents ’ 
Reports of School Practices to Involve Families (NCES Publication No. 96-327). Other reports will be 
forthcoming, including one covering the topic of Parent Involvement in Schools. 



1.3.8 NHES:96 Civic Involvement (Cl) 

The development of the NHES:96 Cl component was guided by the following research questions: 

1. To what extent do parents and other household members promote civic awareness and 
personal responsibility in their children? 

2. To what extent do 6th through 12th grade students engage in activities that promote or 
indicate personal responsibility, cooperative behavior, and/or leadership opportunities? 

3. To what extent do 6th through 12th grade students participate in service activities? 

4. To what extent do 6th through 12th grade students report engaging in or witnessing 

activities that may encourage civic awareness? 

5. To what extent do adults participate in community service activities? 

6. To what extent do adults participate in political activities? 
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To date, three reports have been prepared for publication by NCES for the Cl component, entitled Adult 
Civic Involvement in the United States (NCES Publication No. 97-906), Student Participation in 
Community Service Activity (NCES Publication No. 97-331), and Student Interest in National News and 
its Relation to School Courses (NCES Publication No. 97-970). 



1.4. Must Read Publications 

Before a researcher attempts to use the NHES data files, it is strongly suggested that time be spent 
reading the NHES Data File User’s Manuals. The chapters of the User’s Manuals can be found on the 
CD-ROM in WordPerfect 5.1 (.WP5) format. The following list of documents will provide researchers 
with comprehensive information that will help them understand the complexities of the NHES data files. 

National Household Education Survey of 1991, Preprimary and Primary Data Files User’s Manual 
(NCES Publication No. 92-057) - User’s Manual covering the ECE component administered in 1991. 

National Household Education Survey, Adult and Course Data Files User’s Manual (NCES 
Publication No. 92-019) - User’s Manual covering the AE component administered in 1991. 

National Household Education Survey, School Readiness Data File User’s Manual (NCES 
Publication No. 94-193) - User’s Manual covering the SR component administered in 1993. 

National Household Education Survey, School Safety and Discipline Data File User’s Manual (NCES 
Publication No. 94-218) - User’s Manual covering the SS&D component administered in 1993. 

National Household Education Survey, Early Childhood Program Participation Data File User’s 
Manual (NCES Publication No. 96-825) - User’s Manual covering the ECPP component administered 
in 1995. 

National Household Education Survey, Adult Education Data File User’s Manual (NCES Publication 
No. 96-826) - User’s Manual covering the AE component administered in 1995. 

National Household Education Survey, Data File User’s Manual Volume I (NCES Publication No. 
97-425) - User’s Manual containing information on the 1996 National Household Education Survey as 
a whole. 

National Household Education Survey, Data File User’s Manual Volume II: Household & Library 
Data File (NCES Publication No. 97-424) - User’s Manual covering the 1996 Screener/Household & 
Library interview. 

National Household Education Survey, Data File User’s Manual Volume III: Parent and Family 
Involvement in Education and Civic Involvement Data File (NCES Publication No. 97-423) - User’s 
Manual covering the 1996 Parent PFI/CI component. 



National Household Education Survey, Data File User’s Manual Volume IV: Youth Civic 
Involvement Data File (NCES Publication No. 97-422) - User’s Manual covering the 1996 Youth Cl 
component. 

National Household Education Survey, Data File User’s Manual Volume V: Adult Civic Involvement 
Data File (NCES Publication No. 97-421) - User’s Manual covering the 1996 Adult Cl component. 
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1.5. Availability of Technical Expertise 



Staff at NCES and Westat have been closely associated with important components of the NHES design, 
instrumentation, sampling, data collection, data processing, and analyses that have occurred over the span 
of this project. These individuals, their area of expertise, and their phone numbers are included below. 



Overall Knowledge of the NHES 



Kathryn Chandler (NCES) 
Dan Kasprzyk (NCES) 
Mary Collins (Westat) 


(202) 219-1767 
(202) 219-1588 
(301) 251-4273 


Statistical Procedures 

Mike Brick (Westat) 


(301) 294-2004 


NHES Components 




NHES:91 Early Childhood Education 
Kathryn Chandler (NCES) 

Mary Collins (Westat) 


(202) 219-1767 
(301) 251-4273 


NHES:91 Adult Education 
Roslyn Korb (NCES) 
Peter Stowe (NCES) 

Carin Celebuski (Westat) 
Kwang Kim (Westat) 


(202) 219-1587 
(202) 219-2099 
(301) 294-3986 
(301) 517-4078 


NHES. 93 School Readiness 
Kathryn Chandler (NCES) 
Mary Collins (Westat) 


(202) 219-1767 
(301) 251-4273 


NHES:93 School Safety and Discipline 
Kathryn Chandler (NCES) 

Mary Jo Nolin (Westat) 


(202) 219-1767 
(301) 294-2031 


NHES:95 Early Childhood Program Participation 
Kathryn Chandler (NCES) (202) 219-1767 

Mary Collins (Westat) (301) 251-4273 

Laura Loomis (Westat) (301) 517-4049 


NHES:95 Adult Education 
Roslyn Korb (NCES) 
Peter Stowe (NCES) 
Kwang Kim (Westat) 


(202) 219-1587 
(202) 219-2099 
(301) 517-4078 
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NHES:96 Parent and Family Involvement in Education 



Kathryn Chandler (NCES) 
Mary Collins (Westat) 

Nancy Vaden-Kieman (Westat) 



(202) 219-1767 
(301) 251-4273 
(301) 517-4044 



NHES.96 Civic Involvement 
Kathryn Chandler (NCES) 
Chris Chapman (ESSI) 
Mary Collins (Westat) 

Mary Jo Nolin (Westat) 



(202) 219-1767 
(202) 219-4182 
(301) 251-4273 
(301) 294-2031 



2. Brief Descriptions of the Separate NHES Data Files 



The purpose of this section is to provide a brief description of the separate NHES data files. Analysts 
are reminded of the importance of carefully reviewing the NHES Data File User’s Manuals prior to 
conducting analyses of the NHES data. The User’s Manuals provide extensive information on the 
sampling, methodology, questionnaires, data collection procedures, use of weights, and data file layouts. 



2.1. NHES:91 Early Childhood Education Preprimary Data File 

The NHES:91 Preprimary file contains data from 7,655 completed interviews with parents of children 
age 3 through kindergarten. Included are variables from the Screener and Preprimary interviews, derived 
variables, weights, and replicate weights. Users are able to merge this file with the NHES:91 Primary 
file for the purposes of conducting analyses involving all children (see the Data File User’s Manual for 
a discussion of this procedure). 



2.2. NHES:91 Early Childhood Education Primary Student Data File 

The NHES:91 Primary file contains data from 6,237 completed interviews with parents of children in first 
grade through age 8, plus 9-year-olds in first or second grade. Included are variables from the Screener 
and Primary School interviews, derived variables, weights, and replicate weights. Users are able to 
merge this file with the NHES:91 Preprimary file for the purposes of conducting analyses involving all 
children (see the Data File User’s Manual for a discussion of this procedure). 



2.3. NHES:91 Adult Education (AE) Data File 

The NHES:91 Adult file contains responses from each completed AE participant and nonparticipant 
interview. Included are variables pertaining to courses taken and reported by each participant 

respondent. In total, there are 12,568 completed Adult interviews in this file; this includes 9,774 
participant and 2,794 nonparticipant interviews. 



2.4. NHES:91 Adult Education (AE) Course Data File 

The NHES: 91 AE Course file contains information for each course reported by AE participants in the 
AE interview. Note that course information is also included for each adult included in the Adult File. 
In the Course file, the unit of analysis is each course described by adult respondents in completed AE 
interviews; there are 17,612 courses in the NHES:91 Course file. 



2.5. NHES:93 School Readiness (SR) Data File 

This file contains data from 10,888 completed SR interviews with parents of children age 3 through 7 
plus children up to age 9 in second grade or below. The file is organized so that logically related sets 
of variables are grouped together. The data items are listed in the file in the following order: 
identification (or system) variables, household membership information, questionnaire item variables, 
derived variables, weights, replicate weights for variance estimation, and imputation flag variables. 
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2.6. NHES:93 School Safety and Discipline (SS&D) Parent and Youth Data File 



In total, there were 19,184 completed NHES:93 SS&D interviews. Of this total, 12,680 interviews were 
completed by parents of third through twelfth graders and 6,504 were completed by youth in the sixth 
through twelfth grades. There is a separate case in the file for each interview completed; therefore there 
are 12,680 parent cases and 6,504 youth cases. For each of the 6,504 youth cases, there is a 
corresponding parent case with which it can be linked for dyad analysis. The data items are included in 
the following order: identification (or system) variables, household membership information, 
questionnaire item variables, derived variables, weights, replicate weights for variance estimation, and 
imputation flag variables. 



2.7. NHES:95 Early Childhood Program Participation (ECPP) Data File 

This file contains data from 14,064 completed ECPP interviews with parents of children age 10 or 
younger and in the third grade or below. The file is organized so that logically related sets of variables 
are grouped together. The data items are listed in the file in the following order: identification (or 
system) variables, household membership information, questionnaire item variables, derived variables, 
weights, replicate weights for variance estimation, imputation flag variables, and other flag variables. 



2.8. NHES:95 Adult Education (AE) Data File 

This file contains data from 19,722 completed AE interviews with adults aged 16 or older not enrolled 
in elementary or secondary school and not on active duty in the U.S. Armed Forces. Of the 19,722 
adults who completed interviews, 11,713 were identified as participants in adult education activities and 
8,009 as nonparticipants. Similar to the other files, logically related sets of variables in the AE data file 
are grouped together. The data items are listed in the file in the following order: identification (or 
system) variables, household membership information, questionnaire item variables, derived variables, 
weights, replicate weights for variance estimation, imputation flag variables, and other flag variables. 



2.9. NHES:96 Household & Library Data File 

This file contains data from 55,708 completed Screener interviews. Thie data items are listed in the file 
in the following order: identification (or system) variables, household member summary variables, public 
library use variables, household characteristics variables, household membership variables, derived 
variables, weights, replicate weights for variance estimation, and imputation flag variables. 



2.10. NHES:96 Parent and Family Involvement in Education and Civic Involvement (Parent 
PFI/CI) Data File 

This file contains data from 20,792 completed Parent PFI/CI interviews with parents of children aged 3 
through 20 and in 12th grade or below. The data items are listed in the file in the following order: 
identification (or system) variables, household membership variables, questionnaire item variables, 
household characteristics variables, derived variables, weights, replicate weights for variance estimation, 
and imputation flag variables. 
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2.11. NHES:96 Youth Civic Involvement (Youth Cl) Data File 



This file contains data from 8,043 completed Youth Cl interviews with youth in grades 6 through 12. 
The data items are listed in the file in the following order: identification (or system) variables, household 
membership variables, questionnaire item variables, household characteristics variables, derived variables, 
weights, replicate weights for variance estimation, and imputation flag variables. 



2.12. NHES:Adult Civic Involvement (Adult Cl) File 

This file contains data from 2,250 completed Adult Cl interviews with adults 18 years old or older, not 
enrolled in grade 12 or below, and not on active duty in the military. The data items are listed in the 
file in the following order: identification (or system) variables, household membership variables, 
questionnaire item variables, household characteristics variables, derived variables, weights, replicate 
weights for variance estimation, and imputation flag variables. 



2.13 Adult Education Course Code Merge Files 

The Adult Education Course Code Merge files contain the course code variables for each reported course. 
These merge files cannot be used by themselves for analysis; instead, they are designed to be used with 
the AE public data files (i.e., NHES:91 Adult file, NHES:91 Course file, and NHES:95 AE file). The 
course code merge files for each public AE data set are as follows: CC91A.DAT for the NHES:91 Adult 
file; CC91C.DAT for the NHES:91 Course file; and CC95.DAT for the NHES:95 AE file. In the 
NHES:91, course information was collected for the four courses taken most recently by the respondent. 
In the NHES:95, information was collected about courses taken in a part-time credential program, taken 
for work-related reasons, and taken for personal development or personal interest. 
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3. Comparisons with Other Data Sets 



In addition to conducting analyses with the NHES data sets, analysts may wish to compare NHES 
estimates to those obtained in other large-scale data collections. When making such comparisons, 
researchers should carefully review methodological and wording differences that may affect responses. 
This section briefly describes some data sets that address issues related to the NHES components. 



3.1. Early Childhood Education and Related Issues: NHES:91 Early Childhood Education, 
NHES:93 School Readiness, and NHES:95 Early Childhood Program Participation 

The early childhood components of the NHES have addressed participation in child care and early 
childhood programs, home activities, health status, and a number of related issues. Data users may wish 
to compare some estimates, for example, center-based program participation or frequency of home 
activities, across NHES early childhood data sets. In addition, several extant data sources include similar 
items. 



■ The Current Population Survey October Education Supplement (Bureau of the Census) has 
collected information on enrollment in nursery school and school for many years. Since 
many parents report center-based early childhood programs of many types (including day 
care centers) in response to these items, estimates of participation in such programs can 
be compared. In addition, estimates of retention in early grades can be analyzed with 
both the Current Population Survey (CPS) and the NHES data. 

■ The 1990 Current Population Survey October Education Supplement replicated several 
NHES items on home activities in which parents engage with their children. 

■ The National Health Interview Survey Child Health Supplement of 1988 (National Center 
for Health Statistics) collected information on participation in child care and early 
childhood education programs and extensive information on the health status of children. 

■ The Survey of Income and Program Participation (Bureau of the Census) is a recurring 
survey that periodically includes a supplement that collects information on the child care 
and early childhood program participation of children of mothers who are employed or 
enrolled in school or job training. Child care supplements were administered in 1991 and 
1993. 



3.2. Adult Education, NHES:91 and NHES:95 

The NHES Adult Education component fills an important information need, since there are no other 
current national data available on the participation of adults in the broad range of adult education 
activities. As a result, researchers will not find current data available for comparison. However, those 
analysts interested in participation over time may wish to examine the available CPS data on this topic. 

■ From 1969 through 1984, the Current Population Survey (Bureau of the Census) included 
a triennial supplement on participation in adult education. Supplements were administered 
during the month of May in 1969, 1972, 1975, 1978, 1981, and 1984. These data can 



be used to examine the participation of adults over time and in comparison with current 
levels of participation. 

In the 1992 Current Population Survey October Education Supplement, sets of items 
measuring adult education participation as done in previous CPS administrations and in 
the NHES:91 were included. The purpose of this was to assess the extent to which 
differences in measurement may be responsible for observed differences in participation 
rates. 



3.3. School Safety and Discipline 

■ Monitoring the Future (National Institute on Drug Abuse) gathers information annually 
on the prevalence and incidence of the illicit drug use of 12th graders. In addition, it 
contains questions designed to describe and explain changes in many important values, 
behaviors, and lifestyle orientations of American youth. The survey was first conducted 
in 1975, and the sample was expanded in 1991 to include 8th and 10th grade students. 

■ The purpose of the 1989 National Crime Victimization Survey, School Crime Supplement 
(U.S. Department of Justice, Bureau of Justice Statistics) was to provide detailed 
information on personal crimes of violence and theft that were committed inside a school 
building or on school property. The basic National Crime Victimization Survey is an 
annual ongoing effort. The School Crime Supplement was repeated in 1995. 

■ The National Education Longitudinal Study of 1988 (National Center for Education 
Statistics) represents a major longitudinal effort designed to provide trend data about 
critical transitions experienced by 8th grade students as they leave middle/junior high 
school and progress through high school into college or their careers. Data from this 
study can be used to examine educational issues such as school environment issues, school 
discipline issues, victimization at school, and drug and alcohol education. 

3.4 Parent and Family Involvement in Education: NHES:96 

■ The National Education Longitudinal Study of 1988 also contains items relevant to the 
NHES:96 PFI component. For example, data analysts may wish to examine NELS:88 
data in conjunction with NHES:96 PFI estimates on school contacts to parents (by parent 
report) and frequency of parents helping the child with his or her homework. 



3.5 Civic Involvement: NHES:96 

■ In the 1995 Current Population Survey October Education Supplement, sets of items 
measuring the percentage distribution of the adult population, age and gender of the adult 
population, household income distributions, and race/ethnicity by highest level of 
education may be compared with various NHES:96 Adult Cl estimates. 

■ The 1992 National Adult Literacy Survey (NALS) collected data on adults’ activities in 
daily life that require English literacy skills. Data from the NALS may be used for 
comparisons with NHES:96 Adult Cl estimates of frequency of newspaper reading among 
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adults, the number of magazines read regularly by adults, and the percentage of adults 
reading any books in the last six months. 

■ The General Social Survey (GSS) measures trends in adults’ attitudes and reports of 
behaviors regarding a wide range of issues. Areas common to the 1994 GSS and the 
NHES:96 adult Cl component include organizational membership, various political or 
civic activities, and attitudes about freedom of speech. 

■ The National Election Study (NES) collects data on voting, public opinion and political 
participation and knowledge during election years. Several NHES:96 items from the Cl 
component, addressing political knowledge, were drawn from the NES and can be used 
for direct comparisons. 

■ The Citizens’ Political and Social Participation (CPSP) survey measures the extent and 
variety of voluntary social and political activity among Americans and the causes of that 
engagement. 

■ The Washington Post/Kaiser Family Foundation/Harvard University Survey Project 
combines survey research and newspaper reporting for the purpose of examining and 
reporting public knowledge, perceptions, and attitudes about the role of American 
government. 

■ The National Survey of High School Seniors elicits detailed information on political and 
relevant non-political matters so that parent-child similarities and differences can be 
assessed. 




17 



23 



4. Getting to Know the Data 



This section of the Guide provides information to help to ensure successful use of the NHES data. 
Included are general discussions of the data collection and data processing procedures used in preparing 
the data, as well as some potential pitfalls in file preparation. 



4.1. NHES Data Collection and Processing 

While researchers conducting secondary analysis were not involved directly in the collection and 
processing of data, an understanding of these processes is important to thoughtful and appropriate use of 
the data. The method of data collection, the length of the interview, the process for cleaning the survey 
data during and after data collection, quality control activities, and the development of weights are all 
important considerations. For example: 

■ Some respondents did not complete the entire interview, but completed enough so that the 
interview was included as a partial complete. 

■ In the NHES components, as in every survey, there was some item nonresponse. 

► Some respondents did not know or could not recall the answers to specific items. 

► Some refused to answer certain questions. 

► Respondent or interviewer error led to erroneous paths that were corrected in the 
data preparation process based on interviewer notes. 

► Some respondents did not finish the entire interview. 

The procedures used to deal with these circumstances are described in the Data File User’s Manual for 
each component. Users will find tables showing item response rates for several questions; the item 
response rates presented in the tables were selected to illustrate the rates for key items, the rates for items 
appearing early and later in the questionnaire, and the range of item response rates. Most of the item 
response rates are very high in the NHES, as in most telephone surveys. 

In addition to item nonresponse, most surveys contain a few data anomalies. These may be real or 
apparent inconsistencies in the data. These are also discussed in the Data File User’s Manuals. 

Analysts can perform their own quality checks on the NHES data relatively easily. A common means 
of doing so is comparing survey estimates to known population totals or to population estimates from 
well-established sources such as the Current Population Survey. Another data quality procedure, 
appropriate to not fully imputed data files such as the NHES:91 components, is to compare item 
respondents to item nonrespondents. An appropriate question the analyst would ask him or herself is 
"Are certain types of respondents more or less likely to refuse a question or to respond that they do not 
know the answer?" 
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4.2. Potential Pitfalls in File Preparation 



Fortune and McBee (1984) have grouped pitfalls in data file preparation into seven categories. These 
seven categories are: 

1) Sample skews; 

2) Merger mortality; 

3) Nonresponse noise; 

4) Variant variables; 

5) Aggregate anomalies; 

6) Time tangles; and 

7) Mechanical misuses. 

In the following text, the applicable pitfalls of file preparation are defined (quoted from Fortune and 
McBee) and then related to the NHES. 

Sample skews "occur when data bases are merged, when data with missing values are used to 
construct variables, when nonresponse and item nonresponse are not reported, when data weights 
are incorrect, and when oversampling is not reported." 

Analysts should review relevant sections of the Data File User’s Manuals to be sure that they understand 
the design of the NHES, including the oversampling of some telephone clusters, the use of sample 
weights, and the appropriate use of replicate weights. In addition, analysts should review sections of the 
User’s Manuals, this document (section 6), and other relevant publications in order to become familiar 
with the issues surrounding telephone coverage of the population. 

In the construction of composite variables or indexes, the analyst should be aware of potential biases 
resulting from missing values. First, missing values, which are coded as negative numbers in NHES data 
files, should be set to missing prior to the construction of new variables. Second, users should assess 
the extent to which patterns of nonresponse may lead to a particular group of respondents being 
underrepresented in a composite index. While item nonresponse is very low in the NHES, such an 
examination will help to uncover any particular difficulties with missing data. This is particularly the 
case for the NHES:91, when full imputation of missing values was not conducted; imputation is discussed 
in section 7. 

Merger mortality "occurs when a large or disproportionate segment of a population has to be 
dropped from the data file on which the study is to be conducted." 

A common problem that might arise in some types of analyses of NHES data is called merger mortality. 
This problem occurs when analysis is restricted to small subgroups of the sampled population, typically 
defined as units that have several characteristics simultaneously. The result is that often a large 
proportion of the total sample size is excluded from the analysis and the included sample may not be large 
enough to adequately address the analytic concern. 

For example, when using the NHES:95 data, analysts might wish to analyze only those households in 
which both the AE and ECPP interviews were completed. The first problem with this type of analysis 
is that it severely reduces the sample size. In addition, the person-weight variables (i.e., AEWEIGHT 
and EWEIGHT) in the NHES:95 data files were designed for a specific population (adults or children) 
and are generally not appropriate for analysis of both adults and children simultaneously. Proper analysis 
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of this type of subgroup might require the development of special weights to meet the specific goals of 
the analysis. 

In other NHES surveys, multiple interviews may have been conducted for a given household, but these 
were not designed to be matched. For example, in the NHES:91 and the NHES:95, a parent may have 
responded to a parent interview about a child, as well as to an Adult Education interview about himself 
or herself. Merging responses from these two different interviews may be problematic, since they were 
not designed to be linked. Likewise, there may not be an Adult Education interview completed for a 
respondent who completed a parent interview and vice versa. 

Nonresponse noise "occurs when nonresponse is systematic or disproportionate, when the reason 
for nonresponse cannot be determined, and when its effect cannot be estimated." 

Both unit (interview) and item (question) nonresponse can introduce nonresponse noise. Unit nonresponse 
effects are more difficult to assess, particularly in a survey using a random digit dialing approach. Few 
items in the NHES have high item nonresponse rates (see Data File User’s Manuals). However, when 
specific types of respondents have higher nonresponse rates, or when underlying circumstances (e.g., a 
child’s poor school performance) lead to item nonresponse, there is a potential for noise in the data. 

Variant variables "occur when there are slight differences in the definitions of a variable across 
two data bases or when the variable is coded in different ways in two data bases.” 

In comparing NHES estimates with estimates from other surveys, analysts may find questions asked in 
different formats or with different response options. In addition, some differences in the wording of 
NHES items may occur from cycle to cycle as part of the ongoing data quality activities of the project. 
Analysts should carefully check the comparability of items across surveys. Example 3 in appendix B 
provides an example of this type of check. 

Aggregate anomalies "occur when the unit of analysis and the level at which measurement 
occurred are different or when the organizational unit used to create a new data file is different 
from the one in the primary data base." 

A potential for aggregate anomalies exists in the NHES when analysts combine responses for one or more 
children for the purpose of conducting family-level analysis. While the NHES:91 ECE component 
sampled all eligible children in a household, a maximum of two children per household were selected for 
each NHES:93 component, for the NHES:95 ECPP component, and for the NHES:96. Analysts should 
keep this in mind when aggregating records to the household level. 

Mechanical misuses "can be defined as human errors that affect data processing and computer 
mechanics. " 

The complexity of the NHES data file structure provides ample opportunity for human error. This is 
ameliorated to some extent by editing procedures conducted online in the CATI system used to collect 
the data and in post-collection editing activities. However, the potential for such error can never be 
eliminated. 

Mechanical misuses can also occur in the analytical stage of data processing. For example, if a person 
were to run frequencies for the entire School Safety and Discipline data file and report a percentage based 
on all records, this would be a mechanical misuse because it does not take into account that there are both 
parent and youth records contained in the file. 
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5 . Selecting Variables for Working Data Sets 



5.1. Developing a Research Plan 

After a researcher 1) understands how the NHES data were collected and processed, 2) avoids the 
common pitfalls in data file preparation, 3) understands the limitations of the data, and 4) studies the 
research issues that can be addressed, he/she is ready to begin developing a research plan. 

The working data file will be used by the researcher to test the research questions that are derived from 
previously conceived conceptual models. Before a working data set is created, the following steps are 
suggested:. 

1 . Develop a research question ~ What does prior research suggest is happening (e.g. , how 
do parents and students perceive the learning environment of the school)? 

2. Determine the predictor, or independent, variables (e.g. , school grade level, school type, 
school size, student’s race/ethnicity, and parents’ highest education) and outcome, or 
dependent, variables (e.g., academic challenge, enjoyment of school, mutual respect 
between pupils and teachers, good discipline maintained by teachers and administrators, 
peer norms that support hard work for achievement, and peer norms that support good 
behavior) that can be used to answer the research question. 

3. Determine what aspects of the research question can be answered with NHES data. If 
there are multiple sources of the data (e.g., parent and youth responses) available, decide 
if only one source or both would be the most analytically appropriate. 

4. Rethink the original research question. If the variables contained on the NHES data files 
cannot be used to study your original research question, rethink the research question and 
either modify the research question or choose another data set. 



5.2. Subsetting Data Files 

Once the above steps have been completed, it is time to create your working data file containing only the 
variables and sample you are interested in examining. The following steps to subsetting are suggested: 

1. According to your research question, determine which variables are needed from the 
NHES data file. 

2. Determine whether your analysis calls for the subsetting of the data by population, e.g., 
adults in the labor force, children of mothers in the labor force, or youth in grades 9 
through 12. 

3. Use SAS or SPSS to select the appropriate cases and variables from the file. This is 
easily done using the NHES:91/93/95/96 Electronic CodeBook (ECB) which permits 
users to generate SAS or SPSS code that will create an extract data file that includes a 
subset of the cases and/or variables included in an NHES data file. See the ECB User’s 
Guide for additional information. Also see example 1 in appendix B for an illustration 
of how to check that a data subset was created correctly. 
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5.3. Derived Variables 



Derived variables were developed and included in the public use data file to aid users in analysis. The 
derived variables fall into three categories: questionnaire item variables, counter variables, and variables 
linked to other data sources (only applicable to NHES:93, NHES:95, and NHES:96). Questionnaire 
item-derived variables were created by combining two or more items from the questionnaire. Counter- 
derived variables were created by counting the number of persons enumerated in the household with 
specific characteristics. In the NHES:93, NHES:95 and NHES:96, linked-derived variables were created 
by using the respondent’s ZIP Code or telephone number to extract data from other data sources, most 
notably the 1990 Census of Population Summary Tape File 3B (STF3B). The derived variables are on 
the file in alphabetical order. 

In the NHES:93 School Safety and Discipline file, most derived variables come from parent responses 
and are included on the youth record to expedite analysis. The exceptions are FEARP, KNOWP, 
VICTIMP, and WITNESSP, which were derived from parent responses and are found on the parent 
record only, and FEARY, KNOWY, VICTIMY, and WITNESSY, which were derived from youth 
responses and are on the youth record only. 

In the NHES:93 and the NHES:95, all of the variables that begin with the prefix ZIP were taken from 
the 1990 Census of Population STF3B. All unique NHES:93 and NHES:95 ZIP Codes were matched 
to ZIP Codes on the STF3B for urbanicity, percent black or Hispanic, and percent of persons under age 
18 living in poverty. 



5.4. Subgroup Analysis 

The NHES is designed to support examination of some specific policy-relevant subgroups. One such 
subgroup is minorities. Hispanics and blacks were selected at a higher than normal rate (oversampled) 
in order to improve national estimates. The data file can be subsetted to include only the specific group 
or groups of interest. In the Electronic CodeBook (ECB) for the NHES:91 through the NHES:96, special 
screens are provided in the data extraction menu that permit users to subset the data using a small number 
of variables that are commonly of interest. See the ECB User’s Guide for additional information. 

Data users should be aware that while poststratification techniques were found to eliminate the vast 
majority of bias that may be due to telephone undercoverage, it is not entirely possible to do so. Adults 
who have not finished high school and households with very low incomes are less likely than others to 
live in telephone households. Therefore, users should carefully consider possible bias when conducting 
analyses that focus on those with very low income. Telephone coverage bias and techniques used to 
correct for it are discussed in coverage reports cited in section 1.4. 



5.5. Dyad Analysis 

The NHES: 93 SS&D component permits the analyses of data from parent and youth dyads, that is, 
parents and youth from the same household who completed parent and youth interviews, respectively. 
The variables PARNYUTH and MAINRSLT in the NHES:93 SS&D data file should be used to identify 
parent and youth cases that have associated youth and parent interviews. To analyze data directly 
comparing a parent and his/her youth’s responses in the NHES:93 SS&D component, the analyst should 
rename the variables from one interview so as not to cause overwriting of values when the records are 
merged. For example, both parent and youth records contain the variable SSSTEAL. The values on the 
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parent records are from parent responses to the question, and the values on the youth record are from 
the youth responses to the question. For an analysis that includes both parent and youth, the variable 
name should be changed so it will be unique for parent and youth. For example, SSSTEAL could be 
changed to SSSTEALP on the parent record and SSSTEALY on the youth record. If the original variable 
name has eight characters, the P or Y will replace the last character. Because each variable will then 
have a unique name, the original values will remain when parent and youth records are merged. 

In the NHES:96, every youth record also has a matching parent record, and the Parent PFI/CI and Youth 
Cl components contain some items in common. Parents and youth were asked about the learning 
environment at school, watching the national news together, and discussing politics or national issues. 
In addition, youth in 9th through 12th grade were asked questions measuring political attitudes and 
questions about skills that could be used in civic life that were identical to questions asked of their 
parents. Parents and 9th through 12th grade youth were also asked one of two five-item sets of 
knowledge questions, but the set administered to the parent respondent was selected at random and the 
youth in that household was administered the other set. 

Users of the NHES:96 who wish to conduct analyses on parent-youth dyads can match cases using 
ENUMID, which is on both the Parent PFI/CI and Youth Cl files. Instructions for linking data files are 
in the NHES:96 Data File User’s Manual. The variables on the Parent PFI/CI and Youth Cl data files 
have unique names, so renaming is not necessary. 
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6. NHES Design 



6.1. Overview of Design 

The NHES was developed to provide reliable estimates for the different topical components of each 
administration. The inclusion of two survey components made the overall survey more cost effective, 
thus allowing for larger sample sizes and more precise estimates. This strategy was key to the NHES 
design. By including more than one topic within the framework of a single survey, the cost of screening 
households to find those eligible for the study could be partitioned over the component surveys. 

Another general feature of the NHES was developed in response to concerns about the potential demands 
placed upon those who respond to multiple survey components. With the introduction of multiple surveys 
within a single framework, the possibility of increasing response burden on the members of the sampled 
households arose. It is possible that the same household member could be selected to respond to more 
than one interview and/or that more than one household member could be sampled. 

Even though sampling methods reduced the number of interviews per household, the length of the 
interview was considered to be a critical factor in obtaining high response rates and reliable estimates. 
Therefore, the number of items included in the NHES was limited in order to help improve response rates 
and reduce the demands made on survey respondents. 

Because of the above requirements, complex sampling techniques, and the need for quick and accurate 
administration, the NHES was conducted using CATI technology. Some of the advantages of CATI for 
the NHES included improved project administration, online sampling and eligibility checks, scheduling 
of interviews according to a priority scheme to improve response rates, managing data quality by 
controlling skip patterns and checking responses online for range and consistency, and an online "help" 
function to answer interviewers’ questions. 

Several different interview instruments were used in each cycle of the NHES. For both the NHES:91 
and NHES:93, three instruments were used; for the NHES:95 and NHES:96, four instruments were used. 
These instruments included screening interviews and the extended interview topical components. Items 
within each of the instruments were programmed so that the appropriate items appeared on the 
interviewer’s computer screen corresponding to the respondent’s answers to previous queries. 



6.2. Household Level Sampling 

The sampling method used for the NHES:91 and NHES:93 is a variant of random digit dialing (RDD) 
procedures described in Waksberg (1978). The original Mitofsky-Waksberg method produces an equal 
probability sample of households with telephones and requires a smaller number of telephone calls than 
the sampling procedures previously used for RDD. A time-saving variant of this method, referred to as 
the "modified Waksberg procedure," was used for both the NHES:91 and NHES:93. The modified 
method is described in Brick and Waksberg (1991). Beginning in 1995, the NHES moved to a list- 
assisted sampling approach (Casady and Lepkowski 1993). This method reduces the number of 
unproductive calls to nonworking or nonresidential numbers (compared with simple random sampling of 
all numbers), produces a self-weighting sample, is a single stage and unclustered sample, and eliminates 
the sequential difficulties associated with the Mitofsky-Waksberg method. However, a disadvantage of 
the list-assisted method is that it incurs a coverage bias because not all telephone households are included 
in the sampling frame. 
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For the NHES:91 and NHES:93, the first step in the sampling process was to form a list of all existing 
telephone area codes and prefix numbers for the 50 states and the District of Columbia (a prefix number 
is a 3-digit telephone exchange). The lists used for the NHES samples were the Bellcore tapes for the 
October preceding data collection. All possible combinations of 2-digit numbers were then added to these 
numbers to create a list of all the possible first 8 digits of the 10 digits in telephone numbers. These 8- 
digit numbers were treated as Primary Sampling Units (PSUs), or telephone clusters. 

A random sample of PSUs was selected. A prime telephone number was formed by adding a random 
two-digit number to the eight-digit cluster. The prime number was then dialed to determine if it was 
residential. If it was residential, the PSU was retained in the sample. If the prime number was not 
residential, then the PSU was rejected and no further calls within the PSU were made. Additional PSUs 
were selected in the same way. 

A random sample of telephone numbers within each of the retained "residential" PSUs was selected by 
adding random two-digit combinations to the original eight numbers. Interviews were attempted at the 
prime number and at as many additional numbers required to obtain the desired expected sample size. 
The total expected sample size was m(k+l), where m was the number of residential PSUs and (k+1) 
was the number of telephone numbers sampled in each PSU. 

The households were sampled within clusters in order to effect a significant cost savings. With this 
method of cluster sampling, the number of telephone numbers that need to be dialed is at least 50 percent 
less than what would be needed if all telephone numbers were dialed at random. However, the variances 
of the estimates were increased, typically by less than 10 percent, due to the clustering of the sampled 
households within the PSUs. 

The sampling method for the NHES:9l and NHES:93 used a fixed number of telephone numbers per 
PSU, rather than a fixed number of households per PSU, as used in the Mitofsky-Waksberg method. The 
statistical properties of this method are described in detail by Brick and Waksberg (1991). The main 
advantage of this method is that it does not require sequential modification to the within-PSU sample size. 

The list-assisted sampling used in the NHES:95 and NHES:96 was conducted by stratifying telephone 
numbers by the type of 100-bank they fall within (all the numbers in a 100-bank have the same first 8 
digits of the 10-digit telephone number). An equal probability random sample of telephone numbers was 
selected from all telephone numbers that were in 100-banks with at least one White Page directory-listed 
telephone number (called the listed stratum). Telephone numbers in 100-banks with no listed telephone 
numbers (called the zero-listed stratum) were not sampled. The telephone numbers in the listed stratum 
included both listed and unlisted numbers provided there was at least one telephone number in the 100- 
bank that was listed. 



With the list-assisted approach, a coverage bias arises because households in the zero-listed stratum have 
no chance of being included in the sample. Empirical findings were presented by Brick, Waksberg, Kulp, 
and Starer (1995) to address the question of coverage bias. These results show that the percentage of 
telephone numbers in the zero-listed stratum that are residential is very small (about 1.4 percent), and 
about 3 to 4 percent of all telephone households are in the zero-listed stratum. Furthermore, the bias 
resulting from excluding the zero-listed stratum is generally small. 

One of the goals of the NHES is to produce reliable estimates for subdomains defined by race and 
ethnicity. In fact, estimates by race and ethnicity were key in developing the sample sizes for each of 
the administrations of the NHES. In a 64,000-household design in which every household has the same 
probability of being included, the number of completed interviews would not be large enough to produce 
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reliable estimates of many characteristics of black and Hispanic youth. Therefore, blacks and Hispanics 
had to be sampled at higher rates to improve the reliability of estimates for these subpopulations. 

In each administration of the NHES, exchanges with higher concentrations of blacks and Hispanics have 
been oversampled. Mohadjer and West (1992) showed that this method was successful in reducing the 
variances for estimates of characteristics of blacks and Hispanics by approximately 20 to 30 percent over 
a range of statistics examined. The decreases in precision for estimates of the groups that were not 
oversampled and for estimates of totals were modest, ranging from about 5 to 15 percent. 

A computer file containing census characteristics for telephone exchanges was used to stratify telephone 
exchanges into low- and high-minority concentration strata. Any telephone exchange not found on the 
file was assigned to the low-minority concentration stratum. 

The specific design defined high-minority concentration areas as exchanges having at least 20 percent 
black or 20 percent Hispanic persons (or 20 percent Asian/Pacific Islander persons for the NHES:93) 
living in the area. The telephone exchanges in the two strata were identified and a systematic sample was 
drawn in each stratum. The sampling fraction used in the high-minority concentration stratum was two 
times the fraction used in the low-minority concentration stratum. 1 

Oversampling by the characteristics of the telephone exchange had two effects. First, the oversampling 
increased the sample sizes for minorities because they were more heavily concentrated in the exchanges 
that were oversampled. Therefore, the sampling errors for estimates of these groups were reduced due 
to the increased sample size. On the other hand, not all minorities were found in the oversampled 
exchanges. Thus, differential sampling rates were applied to persons depending on their exchanges. 
Using differential rates increased the sampling errors of the estimates. These increases partially offset 
the benefit of the larger minority sample sizes. 

In the NHES: 96, the goal of making estimates at the state level for characteristics of household members 
and for household library use also determined the number of telephone numbers selected. A target of 
500 screened households per state was set. A sample of 500 households is large enough so that if 30 
percent of the households in a state have the characteristic then differences of 6 percent can be detected. 
Due to nonresponse at the Screener level and lower residency rates than expected in some states, 500 
Screeners were not completed in some states. The lower number of responses limits the ability to make 
estimates for some subgroups within states. Analysts should examine the standard errors for subgroups 
of interest in their analysis to evaluate the precision of within-state estimates. 



6.3. Topical Component Samples 

The NHES:91, NHES:93, NHES:95, and NHES:96 each included two survey topics. The NHES:91 AE 
sampled households for adults (age 16 years or older) who participated in at least one adult education 
course or activity during the past 12 months (including full-time, degree-seeking students) and adults who 
had not participated in the past 12 months. For the ECE survey, two specific populations of 3- to 8-year- 
old children were sampled: those who had not yet enrolled in primary school and those who were 
currently enrolled in primary school (or were 6 years of age or older and receiving home schooling or 
education in alternative programs). In addition, 9-year-olds who were enrolled in second grade or below 



'Research was done for the NHES Field Test of 1989, the NHES:91, and the NHES:93 that tested the effects of different 
sampling plans and definitions of high minority strata on sample sizes and variances of estimates. This research led to 
implementing the procedures just described for oversampling telephone numbers in high minority areas. 
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were also included to improve estimates of retention in early grades. The age-eligibility was determined 
by the child’s age on December 31, 1990. 

In the NHES:93 SR component, data were collected about children aged 3 through 7 and in second grade 
or below. Those children who were 8 or 9 years old, but who were enrolled in first or second grade, 
were also eligible for the SR survey. The age-eligibility was determined by the child’s age on December 
31, 1992. In the SS&D component, data were collected about children enrolled full time in any grade 
3 through 12. The youth was usually between 8 and 20 years old, as determined by his/her age on 
December 31, 1992. (Youths who were age 21 and enrolled in 12th grade or below were sampled at the 
Screener. If, at the beginning of the extended interview, it was determined that the youth was over 20 
on December 3 1 , 1992, the interview was terminated.) A subsample of youth in 6th through 12th grades, 
generally age 11 and older, were also interviewed about their school experiences. There were some 
interviews with "emancipated youth," that is, youth who were enrolled full time in 6th through 12th grade 
and living independently from any parent/guardian. There are relatively few emancipated youth cases 
(n=77); all were 16 to 20 years old and enrolled in 10th through 12th grades. These youth were asked 
the same questions regarding school experiences as other youth, plus some questions pertaining to child, 
family, and household characteristics that normally would have been answered by the parent/guardian. 

In the NHES:95 ECPP component, data were collected about children aged 10 or younger and in third 
grade or below. The age-eligibility was determined by the child’s age on December 31, 1994. For the 
NHES:95 AE component, adults age 16 or older and not enrolled in elementary or secondary school and 
not currently serving in the U.S. Armed Forces were sampled for interviews. The probability of an adult 
being sampled depended upon whether he or she had a high school diploma and whether he or she had 
taken any classes in the previous 12 months. One adult was selected for the NHES:95 AE Splice sample 
interview from each splice sample household. 

There were three populations of interest for the NHES:96: children aged 3 through 20 on December 31, 
1995 enrolled in grade 12 or below, whose parents responded to PFI items and, for parents of 6th 
through 12th graders, also to Cl items; students in grades 6 through 12, who responded to Cl items and 
to a small number of PFI items; and adults, defined as persons 18 years old or older, not enrolled in 
grade 12 or below, and not on active duty in the military, whose responses to Cl items provided estimates 
representative of all civilian U.S. adults. All sampled households were included in the population of 
interest for the Screener/Library interview, except those in which all members were on active duty in the 
armed forces. 
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7. Working with Missing Data 



7.1. Sources of Missing Data 

As in most surveys, the responses to some data items are not obtained for all interviews. There are 
numerous reasons for item nonresponse. Some respondents do not know the answer for the item or do 
not wish to respond for other reasons. Some item nonresponse arises when an interview is interrupted 
and not continued later, leaving items at the end of the interview blank. Item nonresponse may also be 
encountered because responses provided by the respondent are not internally consistent, and this 
inconsistency is not discovered until after the interview is completed. In these cases, the items that are 
not internally consistent were set to missing. 

For most of the data items collected in the NHES, the item response rate was very high. In the 
NHES:91, missing data were imputed for those variables required for weighting or contributing to the 
derived variables. In the NHES:93, NHES:95, and NHES:96, all of the data items with missing data 
on the file were imputed. Thus, for the NHES:93, NHES:95, and NHES:96 the only missing values 
remaining are those that indicate legitimate skips (see example 2 in appendix B for an illustration of this). 
The imputations were done for two reasons. First, certain variables were used in developing the national 
estimates and complete responses were needed for this purpose. These included the variables used for 
raking and for developing sampling weights. Second, some data items were expected to be analytical 
variables in many of the publications from the surveys and complete item responses helped to improve 
these presentations. 



7.2. Imputation Procedures in 1991 

As stated above, the NHES:91 was the only NHES survey for which only a subset of the data items were 
imputed. For variables that were imputed in the NHES:91, a nearest-neighbor, hot-deck procedure was 
used to impute missing responses. In this approach, the entire file was placed into a specified sort order 
that varied depending on the data items to be imputed. The sort order was determined by attempting to 
group respondents into those most likely to have the same response for the data item to be imputed. For 
example, consider imputing the variable DADGRADE, the highest level of education completed by the 
child’s father. The sort variables for the imputation were the child’s parents’ marital status 
(PARNMARI), the mother’s education level (MOMGRADE), and whether the father was employed 
(DADWORK). The use of these sort variables in combination assured that adjacent cases on the file were 
similar on all of these characteristics. 

Whenever a case with a missing value was encountered, the value of the data item from the preceding 
complete case was imputed for the missing item. The method is called a nearest-neighbor, hot-deck 
approach because the value from the closest record (the nearest-neighbor) in the current data set (hot- 
deck) is used to replace the missing item. Thus, in the example above, the value for a missing 
DADGRADE was imputed from a case with the same responses for the sort variables (PARNMARI, 
MOMGRADE, and DADWORK) whenever possible. 

For each imputed data item, an imputation flag variable was created. If the response for this item was 
imputed, then the imputation flag was set equal to 1, otherwise it was set to -1 (inapplicable). The flag 
was created to enable users to identify imputed values. The use of these flags is discussed in section 7.6. 
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7.3. Imputation Procedures in 1993 



In the NHES:93, a slightly different hot-deck procedure was used to impute missing responses. In this 
approach, the entire file was sorted into cells defined by characteristics of the respondents. The variables 
used in the sorting were general descriptors of the interview and also included any variables involved in 
the skip pattern for the items. This portion of the procedure was very similar to the sorting used in the 
NHES:91. 

All of the observations were sorted into cells defined by the responses to the sort variables, and then 
divided into two classes within the cell depending on whether or not the item was missing. For an 
observation with a missing value, a value from a randomly selected donor (observation in the same cell 
but with the item completed) was imputed for the missing value. After the imputation was completed, 
edit programs were run to ensure the imputed responses did not violate edit rules. 

Because editing was being finished at the same time as imputation was occurring, there were some 
logically inconsistent values, newly missing values, and imputed values that were out of range. These 
values were set to missing during editing. Further imputations were then necessary. A simplified manual 
imputation was used for these missing values because there were so few imputations recoded. The 
distribution of the completed data was used to draw donors for this manual process. Thus, for these 
newly missing values, the standard sort variables were not used to control the process. 

The general imputation procedures were not used for several variables that were collected only once per 
household or involved complex relationships. The AGE1 through AGE9, SEX1 through SEX9, 
RELATN1 through RELATN9 (only on the SR file), and CRELN1 through CRELN9 (only on the SR 
file) household membership items were manually imputed for the very few cases that had missing values 
because of the need to ensure consistency in household relationships. 

ZIP Code values were imputed once at the household level and then included in the SR and the SS&D 
files. Some 263 households were missing the ZIP Code, and another 12 households gave ZIP Codes that 
did not match ZIP Codes on the 1990 Census of Population STF3B used to create derived variables. 
These ZIP Codes, which affected 390 interviews, were imputed by replacing them with ZIP Codes that 
were on the STF3B file. 

When the hot-deck imputation procedures were completed, the 3 SR items and 12 SS&D items with 
response rates of less than 95 percent were further examined. A search was conducted to find correlated 
variables that could be used in place of the standard sort variables for these items. If useful correlates 
were identified, they were used in the hot-deck imputation for these items. 

For each data item for which any values were imputed, an imputation flag variable was created. If the 
response for the item was imputed, the imputation flag was set equal to 1 (or 2 — see next paragraph); 
otherwise it was set to 0. There were no imputation flags created for AGE92, GRADE, DPAFRAID, 
and HNPUBL4 since there was no imputation done for these variables. The flag was created to enable 
users to identify imputed values. 

Analysts of the SS&D data file might find a "don’t know" response to be analytically useful for some 
items from the parent interview. A parent response of "don’t know" may indicate lack of interest or 
involvement in the child’s school experiences, whereas a youth response of "don’t know" to the same 
question has different implications. To support this analytic objective, the imputation flag was set to the 
value 2 for a "don’t know" response that was imputed for selected parent items. A list of these items 
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is contained in the NHES:93 School Safety and Discipline Data File User’s Manual. The use of these 
flags is discussed in section 7.6. 



7.4. Imputation Procedures in 1995 

The imputation procedures used in the NHES:95 were similar to those used in the NHES:93. A hot-deck 
procedure was used in which the entire file was sorted into cells defined by characteristics of the 
respondents. The variables used in the sorting were general descriptors of the interview and also included 
any variables involved in the skip pattern for the items. 

All of the observations were sorted into cells defined by the responses to the sort variables, and then 
divided into two classes within the cell depending on whether or not the item was missing. For an 
observation with a missing value, a value from a randomly selected donor (observation in the same cell 
but with the item completed) was imputed for the missing value. After the imputation was completed, 
edit programs were run to ensure the imputed responses did not violate edit rules. 

For some items, the missing values were imputed manually rather than using the hot-deck procedure. 
This happened most often when the variable was collected only once for the household or involved 
complex relationships (e.g., variables indicating the ages and relationships of household members). 
Manual imputation was also used if edit failures were found after the hot-deck imputations were 
completed. 

Some additional measures were taken to impute the 42 ECPP and the 17 AE variables that had item 
response rates of less than 90 percent. To improve the imputation for these items, additional sort 
variables were added to the standard sort variables for the hot-deck imputation. Additional sort variables 
were also included in the hot-deck imputation of age-related variables (e.g., the age at which children first 
started attending a center-based program). 

For each data item for which any values were imputed, an imputation flag variable was created. If the 
response for the item was not imputed, the imputation flag was set equal to 0. If the response was 
imputed, the flag was set to either 1, 2, or 3. An additional code of 4 was utilized for some imputation 
flags in the ECPP file. The value of the imputation flag indicates the specific procedure used to impute 
the missing value. The imputation flag was typically set to 1 if the missing value was imputed using the 
standard hot-deck approach. In some cases, variables had to be recoded to be consistent with the skip 
patterns of the questionnaire prior to being imputed using the standard hot-deck approach. For these 
cases the imputation flag was set to 2. For some items with complex skip patterns and only a few 
missing values, the item was imputed manually and the flag was set to 3. In the ECPP file, the 
imputation flag was set to 4 if the reported value was "don’t know" prior to imputation using the standard 
hot-deck approach. Code 4 was utilized for only a subset of ECPP variables for which a "don’t know" 
response might be considered analytically meaningful, specifically, items concerning parent knowledge 
of care provider or program characteristics. 



7.5 Imputation Procedures in 1996 

The imputation procedures used in the NHES.96 were similar to those used in the NHES:95. A hot-deck 
procedure was used in which the entire file was sorted into cells defined by characteristics of the 
respondents. The variables used in the sorting were general descriptors of the interview and also included 
any variables involved in the skip pattern for the items. 
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All of the observations were sorted into cells defined by the responses to the sort variables, and then 
divided into two classes within the cell depending on whether or not the item was missing. For an 
observation with a missing value, a value from a randomly selected donor (observation in the same cell 
but with the item completed) was used to replace the missing value. After the imputation was completed, 
edit programs were run to ensure the imputed responses did not violate edit rules. 

For some items, the missing values were imputed manually rather than using the hot-deck procedure. 
In the NHES:96, hand imputation was done (1) to impute person-level characteristics from the Screener; 
(2) to impute Parent PFI/CI interview variables for children who are home schooled; (3) to impute 
variables that involved complex relationships that would have required extensive programming to impute 
using a hot-deck procedure; (4) to correct for inconsistent imputed values; and (5) to impute for a few 
cases when no donors with matching sort variable values could be found. 

After values had been imputed for all observations with missing values, the distribution of the item prior 
to imputation (i.e., the respondents’ distribution) was compared to the post- imputation distributions of 
the imputed values alone and of the imputed values together with the observed values. This comparison 
is an important step in assessing the potential impact of item nonresponse bias and ensuring that the 
imputation procedure reduces this bias, particularly for items with relatively low response rates (less than 
90 percent). The Household and Library had 5 items with less than a 90 percent response rate, the Parent 
PFI/CI had 27 items, the Adult Cl had 5 items, and the Youth Cl had 5 items. 

For each data item for which any values were imputed, an imputation flag variable was created. If the 
response for the item was not imputed, the imputation flag was set equal to 0. If the response was 
imputed, the flag was set to either 1, 2, 3, or 4. The value of the imputation flag indicates the specific 
procedure used to impute the missing value. The imputation flag was typically set to 1 if the missing 
value was imputed using the standard hot-deck approach. In some cases, variables had to be recoded to 
be consistent with the skip patterns of the questionnaire prior to being imputed using the standard hot- 
deck approach; for these cases, the imputation flag was set to 2. For items that were imputed manually, 
the flag was set to 3. The imputation flag was set to 4 for all variables when the original response had 
been "don’t know" except for the political knowledge questions. "Don’t know and refused responses 
to the political knowledge questions were regarded as legitimate responses and were not imputed. 



7.6 Purpose of Imputation Flags 

The imputation flags were created to enable users to identify imputed values. Users can employ the 
imputation flag to delete the imputed values, use alternative imputation procedures, or account for the 
imputation in computation of the reliability of the estimates produced from the data set. For example, 
some users might wish to analyze the data with the missing values rather than the imputed values. If 
there is no imputation flag corresponding to the variable, no values for that variable were imputed. If 
the imputation flag corresponding to the variable is equal to 1, 2, 3, or 4, the user can replace the 
imputed response with a missing value to accomplish this goal. This method could also be used to 
replace the imputed value with a value imputed by some user-defined imputation approach. Finally , if 
the user wishes to account for the fact that some of the data were imputed when computing sampling 
errors for the estimates, the missing values could be imputed using multiple imputation methods (Rubin 
1987) or imputed so that the Rao and Shao (1992) variance procedures could be used. 
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8. Weights and Estimation Procedures 



8.1. Need for Special Procedures 

In most standard statistical textbooks and software, the analyst assumes that the data are a simple random 
sample from some distribution. In the NHES and most other sample surveys, this assumption is incorrect 
because the sampled units are selected using techniques such as clustering, stratification, and unequal 
probabilities of selection. These sampling methods are used because they greatly reduce the cost of data 
collection and produce efficient and unbiased estimates of the population if the appropriate methods are 
used in the analysis stage. 

In all NHES sample designs, telephone numbers were stratified and selected with unequal probabilities 
of selection. Persons within the households were stratified and selected with varying probabilities. 
Estimation techniques, reflected in the sampling weights, were used to make the estimates from the NHES 
consistent with population control totals and these estimation methods also served to reduce the variability 
of the estimates and remove some of the potential biases in the estimates, especially bias due to 
undercoverage. 

All of these sampling and estimation techniques have consequences for the analytic methods that should 
be applied in making estimates from the NHES data. One of the most important features is that the 
sampling weights should always be used when making estimates of the population. These weights are 
important not only for estimates of totals (unweighted procedures are not reasonable for estimates of 
totals), but also for estimates of means and proportions. As noted above, the sampling and estimation 
procedures used in creating the sampling weights adjusted for the most important sources of biases, and 
unweighted methods of estimating do not include these adjustments. 

The sampling and estimation procedures also have an important impact on the estimates of the reliability 
of the estimates from the NHES. The standard errors of the estimates (or the variance of the estimate 
which is just the square of the standard error) are affected by these procedures . If the standard errors 
are computed using standard statistical software such as SAS or SPSS, they will underestimate the actual 
standard errors for most estimates because the data are not a simple random sample (the data are 
correlated and sampled with unequal probabilities). 

The role of the sample design and estimation procedures in computing estimates and the standard errors 
of the estimates is discussed in a more general setting by Kish (1992). All analysts are urged to take 
these features of the NHES into account when producing estimates, whether they are simple estimates of 
means or more complex estimates of correlations and regression coefficients. 



8.2 Recommended Statistical Procedures 

The recommended methods for producing estimates and their standard errors from the NHES are 
discussed below. 

8.2.1 Software Available to Produce Weighted Estimates 

Most standard statistical software has the capability of handling sample survey weights in calculating 
estimates. (The table in appendix C shows the full sample weight variable for each NHES data file.) 
For example, in SAS the WEIGHT statement can be used with SAS procedures to produce estimates 
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using the sampling weights. While the estimates of the characteristics are appropriate, the standard errors 
of the estimates will not be correct because the procedures assume that the data are from a simple random 
sample. The WesVarPC and SUDAAN software packages (described below in 8.2.2) are able to compute 
appropriate standard errors along with weighted estimates. 



8.2.2 Software Available to Produce Appropriate Standard Errors 

The preferred method of producing unbiased estimates of the population and valid estimates of the 
standard errors of the estimates is to use software designed with this purpose in mind. The two major 
methods of proceeding are by using either replication methods or Taylor series approximations. Special 
software is available for both methods and the NHES data supports either type of analysis. 

We recommend the replication method for the NHES. The replication method involves splitting the entire 
sample into a set of groups, or replicates, based on the actual sample design of the survey. The survey 
estimates can then be estimated for each of the replicates by creating replicate weights that mimic the 
actual sample design and estimation procedures used in the full sample. The variation in the estimates 
computed from the replicate weights can then be used to estimate the sampling errors of the estimates 
from the full sample. Replicate weights have been included in all the NHES data files to make this 
application relatively simple. The table in appendix C lists the replicate weight variables, as well as the 
specific jackknife replication method used to create the replicates, for each NHES data file. 

WesVarPC is software developed for the PC for producing estimates and their standard errors using 
replication methods. The replication method is especially useful for the NHES because this is the only 
method that accounts for both nonresponse adjustments and the raking adjustments to the population 
control totals in the estimation of the standard errors. WesVarPC currently supports a wide variety of 
estimates (totals, means, proportions, ratios, and user-defined functions of estimates) as well as 
procedures for estimating linear and logistic regression coefficients. WesVarPC can read SAS (version 
6.04), SAS Transport, and SPSS for Windows system data files 2 . As of June 1997, the most current 
version of WesVarPC is 2.12. Please use this or a more current version. 

The WesVarPC software is available free of charge through the Internet (http://www.westat.com) or by 
sending an e-mail message to wesvar@westat.com. Those interested in obtaining a copy of WesVarPC 
may also write to: Maida Montes, Westat, Inc., 1650 Research Boulevard, Rockville, MD 20850. 

The Taylor series approach can also be used for the NHES. The two most commonly used software 
packages for this approach are SUDAAN and PCCARP. Both of these programs are for the PC and can 
be used to compute estimates of totals, means, and proportions as well as linear and logistic regression 
coefficients. Neither can account for nonresponse or raking adjustments to the weights, but for many 
estimates these adjustments are not critical for estimating the standard errors. The table in appendix C 
shows the proper design and nesting specifications to use in SUDAAN for each NHES data file. 

SUDAAN is available through the Research Triangle Institute. Information on obtaining the software, 
including cost information, can be obtained by writing to Dr. Babu Shah, Research Triangle Institute, 
PO Box 12194, Research Triangle Park, NC 27709. Information on PCCARP, including costs, can be 
obtained by writing to Dr. Wayne Fuller, Department of Statistics, Iowa State University, Ames, IA 
50010. 
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An approach that is frequently used for complex analysis such as regression is to use the standard 
statistical software for exploratory data analysis and model fitting. Once the model is formulated, the 
appropriate analysis using WesVarPC or SUDAAN is used to estimate the parameters and the standard 
errors. This method is frequently used by analysts who are very familiar with a particular software 
package and feel more comfortable with using it during the exploratory stage. This is often a very 
reasonable compromise, since the final estimates are. produced using the appropriate software. One 
disadvantage is that the exploratory analysis is done without the benefit of the fully valid estimates of the 
standard errors, but this is often not a problem in this type of analysis. 



8.2.3 An Alternative Method for Producing Estimates and Standard Errors 

Not all analysts will follow the recommended method of estimating standard errors. A common 
alternative approach and its likely consequences are discussed briefly below. In specific applications, this 
alternative may be valid and useful; however, for general purposes it has some shortcomings and may 
lead to statements that are not supported by the data. 

This alternative method of analysis is to use a standard statistical package and then adjust the resulting 
standard errors by an average design effect. Most statistical software packages compute standard errors 
of the estimates based upon simple random sampling assumptions. The standard error from this type of 
statistical software can be adjusted for the complexity of the sample design to approximate the standard 
error of the estimate under the actual sample design used in the survey. For example, the variance of 
an estimated proportion in a simple random sample is the estimated proportion (p) times its complement 
(1-p) divided by the sample size (n). The standard error is the square root of this quantity. This estimate 
can be adjusted to more closely approximate the standard error for the estimates from the NHES. 

A simple approximation of the impact of the sample design on the estimates of the standard errors of the 
estimates that has proved useful in NHES surveys and in many other surveys is to adjust the simple 
random sample standard error estimate by the root design effect (DEFT). The DEFT is the ratio of the 
standard error of the estimate computed using the replication method discussed above to the standard 
error of the estimate under the assumptions of simple random sampling. An average DEFT is computed 
by estimating the DEFT for a number of estimates and then averaging. In complex sample designs, like 
those used in the NHES, the DEFT is typically greater than unity due to the clustering of the sample and 
the differential weights attached to the observations. A standard error for an estimate can be 
approximated by multiplying the simple random sample standard error estimate by the average DEFT. 
Each NHES Data File User’s Manual describes the calculation of the average design effect. The table 
in appendix C shows the average design effect for each NHES component. The next few paragraphs 
provide some examples of approximating standard errors using the average design effect. 

Proportions. Suppose that the NHES: 95 ECPP data is used to obtain a weighted estimate of 60 percent 
for some characteristic (for example, suppose that 60 percent of children participate in some type of child 
care arrangement). An approximate standard error can be developed in a few steps. First, obtain the 
simple random sampling error for the estimate using the weighted estimate in the numerator and the 
unweighted sample size in the denominator: the standard error for this 60 percent statistic would be the 
square root of ((60 x 40)/14,064) = 0.41, where the weighted estimate is 60 percent (p), 40 is 100 minus 
the estimated percent (100-p), and the unweighted sample size is 14,064 (n). The approximate standard 
error of the estimate from the NHES: 95 ECPP data is this quantity (the simple random sample standard 
error) multiplied by the DEFT of 1.2. In this example, the estimated standard error would be 0.49 
percent (1.2 x 0.41). 
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Means. The approximate standard error for a mean can be developed using a related procedure. First, 
the mean is estimated using the full sample weight in a standard statistical package like SAS or SPSS. 
Second, the simple random sample standard error is obtained through a similar, but unweighted, analysis. 
Third, the standard error from the unweighted analysis is multiplied by the DEFT to approximate the 
standard error of the estimate under the NHES design. For example, suppose that in the NHES:95 ECPP 
data, the estimated (weighted) mean number of hours per week in nonparental care is 20 and the simple 
random sampling standard error (unweighted) is 5 hours. Then, the approximate standard error for the 
estimate would be 6 hours (5 x 1.2). 

Parameter Estimates of Regression Models. Users who wish to adjust the standard errors for parameter 
estimates of regression models should follow a procedure similar to that discussed for means, above. 
Specifically, the parameter estimates of the model can be estimated using a weighted analysis in a 
standard statistical software package such as SAS or SPSS. A similar, but unweighted, analysis will 
provide the simple random sample standard errors for these parameter estimates. The standard errors 
can then be multiplied by the DEFT to arrive at the adjusted standard error for the NHES design. For 
example, if a given variable in the NHES:95 ECPP data has a weighted estimate of 2.334 and an 
unweighted standard error of 0.45, then the adjusted standard error would be 1.2 x 0.45 = 0.54. 

This method of approximating standard errors increases the standard errors of the estimates. Some 
research (Kish and Frankel 1974) suggests that the standard errors for more complex estimates such as 
regression coefficients may not be subject to design effects that are as large as that of statistics such as 
means and proportions. Thus, this alternative may lead to overestimating the standard errors of the 
estimates, but this error may be less problematic because it leads to confidence intervals and tests that 
still satisfy the nominal level. 
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NHES:91/93/95/96 

Commonly Asked Questions and Answers 

Are the subjects in one NHES survey component related to subjects in other components? 

The NHES is a repeated cross-sectional survey system, and each survey year uses an independent sample. 
Therefore, the sample members from one year (e.g., the NHES:91) are independent from those another 
year (e.g., the NHES:93, the NHES:95, or the NHES:96). 

Some respondents or subjects within one survey year are related to one another. This is because 
households may have had members sampled for more than one component in a given survey cycle, or 
more than one household member may have been eligible for a single survey component. 

For example: 



■ In the NHES:91, all eligible children were sampled. As a result, some subjects in the 
ECE component are siblings, cousins, or other relatives living within the same household. 

■ In the NHES:91, it is possible that household members were sampled for more than one 
component, that is, an adult may have been sampled for adult education and a child (or 
children) may have been sampled for ECE. The sampling rate for the AE component was 
reduced in households with sampled children in order to limit the response burden on the 
households. 

■ In the NHES: 93, up to two children in the household were sampled for the SR component 
and up to two were sampled for the SS&D component. No child was sampled as the 
subject for more than one component. 

■ In the NHES: 95, up to two children in each household were sampled for the ECPP 
component and up to two adults were sampled for the AE component. Thus, within a 
single household, up to four interviews may have been completed. However, sampling 
two adults within a household for AE interviews was relatively rare and only done in 
households containing adult education participants with less than a high school education. 

■ In the NHES:96, up to two children in each household were sampled for the Parent 
PFI/CI component. Two were sampled only when both younger children (age 3 through 
5th grade) and older children (grades 6 through 12) were in the household. Thus, within 
a single household, interviews may have been completed about two children who are 
related. A small proportion of telephone numbers from the NHES: 96 sample was 
randomly selected for an Adult Cl interview rather than a Parent PFI/CI or Youth Cl 
interview. There were no households in which an interview about an adult and an 
interview about a child were conducted. Therefore, in households in which a member 
was selected for an adult Cl interview, no interviews about children were conducted. 

The first 8 digits of the case identification number constitute the household identification, and can be used 
to identify multiple interviews within the same household. The User’s Manual for each NHES component 
explains in more detail how to do this. 
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What measures have been taken in NHES to insure against bias in the data? 

About 94 percent of persons in the United States live in households with telephones. Those without 
telephones are different from those with telephones in some ways, notably in terms of their socioeconomic 
status. In the NHES, special weighting procedures are used to adjust the survey estimates to match totals 
from the Current Population Survey, using poststratification variables that are associated with telephone 
coverage. Additional information on telephone coverage bias is included in the Data File User’s Manuals 
and in the NCES technical reports Telephone Undercoverage Bias of 14- to 21 -year-olds and 3- to 5-year- 
olds (NCES Publication No. 92-101), Undercoverage Bias in Estimates of Characteristics of Adults and 
0- to 2-Year-Olds in the 1995 National Household Education Survey (NCES Working Paper 96-29), and 
Undercoverage Bias in Estimates of Characteristics of Households and Adults in the 1996 National 
Household Education Survey (Brick forthcoming). A technical report, Adjusting for Coverage Bias Using 
Telephone Service Interruption Data in the 1993 National Household Education Survey (NCES Publication 
No. 97-336) presents research on the interruption of telephone service, that is, the extent to which 
households move in and out of telephone coverage. 

Nonresponse is another important potential source of bias in any survey. The NHES project uses a 
calling protocol, refusal conversion efforts, and implementation of a Spanish language questionnaire to 
minimize bias resulting from unit (questionnaire) nonresponse. Item response for the NHES instruments 
is very high, more than 98 percent for nearly all items. Missing values were imputed for all items on 
the NHES: 93, NHES: 95, and NHES: 96 public files; selected items were imputed in the NHES:91 files. 

What do the negative numbers in the data sets mean? 

The NHES data sets contain negative codes to designate missing data. In this way, it is possible to utilize 
the same missing data codes for all items regardless of the length of the field (number of columns) 
occupied by the data element. This enhances the uniformity of the file characteristics and simplifies the 
identification of missing values for data file users. 

For all NHES data sets, a -1 (negative one) designates a legitimate skip, or cases for whom the variable 
is not appropriate. For example, if a person says that his/her child attends a public school, a -1 will 
appear in the item that asks if the school is affiliated with a religion, since the question is inappropriate. 

The NHES: 93, NHES: 95, and NHES: 96 data sets were fully imputed, so they do not contain other 
negative codes. In the NHES:91, however, only variables required for weighting or contributing to key 
derived variables were imputed. Other variables contain additional missing value codes. These are: -7, 
refused; -8, don’t know, and -9, not ascertained. The last of these (-9) was assigned by data preparation 
staff during data cleaning and problem resolution; -7 and -8 were assigned during the interview. 

What are the derived variables and how were they created? 

Derived (or composite) variables are analytically useful constructs that are created using two or more 
variables. For example, an analyst may want to use the highest education of either parent in a household 
as a measure of socioeconomic status. The derived variable PARGRADE was constructed for this 
purpose, and appears in the Preprimary (NHES:91), Primary (NHES:91), School Readiness (NHES:93), 
School Safety and Discipline (NHES:93), Early Childhood Program Participation (NHES:95), Parent and 
Family Involvement in Education and Civic Involvement (NHES: 96), and Youth Civic Involvement 
(NHES:96) files. PARGRADE is composed of four variables: MOMGRADE (mother’s highest grade), 
MOMDIPL (mother has a high school diploma or equivalent), DADGRADE (father’s highest grade), and 
DADDIPL (father has a high school diploma or equivalent). 
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Some other derived variables are "counters," e.g., household size. The derived variables are designed 
to facilitate analysis by providing measures that are likely to be useful. The Data File User’s Manuals 
include a discussion of each derived variable and its composition (chapter 6 of each User’s Manual). The 
manuals also contain the SAS code used for creating derived variables in appendices. 

What if Vm interested in only a part of the population? 

Many times, the research questions guiding one’s analyses pertain to a specific population. That 
population may be defined by age, grade in school, family type, or other factors. Each of the NHES data 
files contains variables that can be used to subset the data file to include the population of interest. 

A key variable provided for this purpose is called MAINRSLT (main interview result). This is a CATI 
system variable that defines the completion status of an interview, often defining a subpopulation to which 
the subject belongs. MAINRSLT can be used to subset the NHES:91 and NHES: 95 Adult Education 
files, separating participants and nonparticipants. For the NHES:91 ECE component, MAINRSLT was 
used to create two separate data files - Preprimary (preschoolers and kindergartners) and Primary 
(primary school students). In the NHES:93 SR data set, MAINRSLT can be used to select preschoolers, 
kindergartners, primary students, and home schoolers. In the NHES:93 SS&D data set, MAINRSLT can 
be used to divide the data file between parents and youth, and by grade level (3rd through 5th, or 6th 
through 12th). In the NHES:95 ECPP data set, MAINRSLT differentiates infants and toddlers, 
preschoolers, kindergartners, primary schoolers, and home schoolers. In the NHES: 96 Parent PFI/CI 
data file, MAINRSLT delineates preschoolers, elementary schoolers, middle schoolers, high schoolers, 
and home schoolers. 

Another variable that can be used to subset some of the files is ALLGRADE (the enrollment status and 
grade of child). Analysts can use ALLGRADE to select only kindergartners from the Preprimary file, 
only 9th through 12th graders from the SS&D file, and so on. Each analyst will want to review the 
available measures in order to define the population of interest to him or her. 

The Electronic CodeBook (ECB) program for the NHES facilitates efforts to subset the population. 
When preparing to write the extract program, the user is presented with a dialog box that allows him or 
her to define the extract population by age, race/ethnicity, sex, and enrollment status. By default, the 
extract population is all respondents in the catalog. 

How do / calculate the standard errors for the NHES data? 

The NHES sample designs are complex, multi-stage designs. As a result, it is erroneous to calculate 
standard errors for the estimates under simple random sampling assumptions. Each NHES data set 
includes two sets of variables that can be used to estimate the standard errors of statistics. 

Replicate weights provided on each data file can be used to calculated standard errors using WesVarPC. 
This program uses replication methods to estimate standard errors. 

PSU and STRATUM also appear on each data file and can be used in Taylor series approximations. There 
are software packages available for calculating standard errors using the Taylor series approach. Among 
these are SUDAAN and PCCARP. 



See section 8 of this guide for additional discussion of variance estimation for the NHES data. 
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NHES:91/93/95/96 Examples to Illustrate Points Made in the Text 
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Example 1: Checking a Subsetted File 

This example shows ways in which a data user can check that his/her subsetting of a data file is correct 
(see section 5.2 for a discussion of subsetting NHES data files). For example, let’s say that an analyst 
has subset the School Readiness file by selecting on MAINRSLT (the completion code for an interview), 
and selected preschoolers as the analysis population. There are two checks the analyst can do to be sure 
that the proper cases are included. Both checks involve comparing numbers from the file output and 
numbers from the codebook. In this example, we compare the total number of cases in the subfile with 
the total number of completed CN interviews in the codebook. As shown below, the number of cases 
in the subfile is 4,423 which matches the number of cases in the total file with a MAINRSLT value of 
CN. 

Often, it is useful to select another variable to do a confirmatory check. For this example, we have 
selected ALLGRADE, a derived variable representing the child’s enrollment status and grade. By 
definition, the derived variable ALLGRADE should equal ’not enrolled’ or ’ nursery /pre-k/Head Start’ 
for all preschoolers (CNs). Comparing the SAS run from the extracted file with the codebook values for 
ALLGRADE, we see that the numbers of children enrolled in preschool (grade N) is the same (n=2,084) 
and no children in the extract file have inappropriate values for ALLGRADE (e.g., kindergarten or a 
primary grade). 

Check #1 — Compare total number of CNs in MAINRSLT from the codebook with the total number of 
CNs in the SAS run from the subfile. 



***************************************************************************** 

FROM ECB CODEBOOK 

***************************************************************************** 



MAINRSLT = INTERVIEW COMPLETION STATUS 

RECORD: 1 POSITION: 11-12 
FORMAT: A2 









UNWGTD 


WGTD 


RESPONSE CODES 


FREQ 


PERCENT 


PERCENT 


CH COMPLETE HOME SCHOOL INTERVIEW 


CH 


62 


0.6% 


0.5% 


CK COMPLETE KINDERGARTEN INTERVIEW 


CK 


2126 


19.5% 


19.7% 


CN COMPLETE PRESCHOOL INTERVIEW 


CN 


4423 


40.6% 


42.8% 


CS COMPLETE PRIMARY SCHOOL INTERVIEW 


CS 


4277 


39.3% 


37.0% 


TOTALS: 




10888 


100.0% 


100.0% 
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*********************************************************************************** 

SAS OUTPUT 

************************************************************************************ 



TABLE OF MAINRSLT BY ALLGRADE 



MAINRSLT (INTERVIEW 
COMPLETION STATUS) 


ALLGRADE (D-CHILD'S 
ENROLLMENT AND GRADE 
EQUIV) 






Frequency 








Percent 








Row Pet 








Col Pet 


N NURS/PREK/HDST 


0 NOT 


TOTAL 






ENROLLED 




CN COMPLETE PRE 


2084 


2339 


4423 




47.12 


52.88 


100.00 




47.12 


52.88 






100.00 


100.00 




Total 


2084 


2339 


4423 




47.12 


52.88 


100.00 



Check #2 — Compare the number of preschoolers (CNs) in nursery/pre-k/Head Start programs in the 
codebook using ALLGRADE (n=2,084) with the number of preschoolers in these programs in the SAS 
run from the subfile (n=2,084). The number of preschoolers (CNs) who are not enrolled in the subfile 
(n=2,339) does not match the total number of children not enrolled in ALLGRADE in the codebook 
(n = 2,340); however, this is not problematic. This is because all preschoolers in the subfile have 
appropriate ALLGRADE values and because children with ALLGRADE = 0 may have a MAINRSLT 
value other than CN. There is one case with ALLGRADE=0 and MAINRSLT =CK. This can be seen 
in a crosstabulation between ALLGRADE and MAINRSLT for all children (shown on next page). This 
child was reported to not be enrolled in school (ENROLL =2) and to not be in home school 
(HOMESCHL=2) and was assigned a MAINRSLT =CK instead of CN based on his/her age 
(AGE92 = 7). 

************************************************************************************ 

FROM ECB CODEBOOK 

************************************************************************************ 



ALLGRADE = D-CHILD'S ENROLLMENT AND GRADE/EQUIV 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


0 NOT ENROLLED 


0 


2340 


21.5% 


23.2% 


N NURS/PREK/HDST 


N 


2084 


19.1% 


19.6% 


T TRANS KIND 


T 


85 


0.8% 


0.8% 


K KINDERGARTEN 


K 


2062 


18.9% 


19.1% 


P PRE/TRANS FRST 


P 


13 


0.1% 


0.1% 


1 1 ST GRD/EQUIV 


1 


2147 


19.7% 


19.8% 


2 2ND GRD/EQUIV 


2 


2140 


19.7% 


17.1% 


3 3RD GRD/EQUIV 


3 


17 


0.2% 


0.3% 


TOTALS: 




10888 


100.0% 


100.0% 
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***************************************************************************** 



SAS OUTPUT 

***************************************************************************** 



TABLE OF MAINRSLT BY ALLGRADE (all children) 



MAINRSLT (INTERVIEW 
COMPLETION STATUS) 


ALLGRADE 

(D-CHILD'S ENROLLMENT AND GRADE EQUIV) 


Frequency 




















Percent 


0 NOT 


1 1ST 


2 2ND 


3 3RD 


K 


N 


P 


T 




Row Pet 


ENROLLED 


GRD/ 


GRD/ 


GRD/ 


KIND 


NURS/ 


PRE/ 


TRANS 




Col Pet 




EQUIV 


EQUIV 


EQUIV 


ERGAR 


PREK/ 


TRANS 


KIND 


TOTAL 














HDST 


FRST 






CH COMPLETE HOM 


0 


17 


10 


0 


32 


0 


0 


3 


62 




0.00 


0.16 


0.09 


0.00 


0.29 


0.00 


0.00 


0.03 


0.57 




0.00 


27.42 


16.13 


0.00 


51.61 


0.00 


0.00 


4.84 






0.00 


0.79 


0.47 


0.00 


1.55 


0.00 


0.00 


3.53 




CK COMPLETE KIN 


1 


0 


0 


0 


2030 


0 


13 


82 


2126 




0.01 


0.00 


0.00 


0.00 


18.64 


0.00 


0.12 


0.75 


19.53 




0.05 


0.00 


0.00 


0.00 


95.48 


0.00 


0.61 


3.86 






0.04 


0.00 


0.00 


0.00 


98.45 


0.00 


100.00 


96.47 




CN COMPLETE PRE 


2339 


0 


0 


0 


0 


2084 


0 


0 


4423 




21.48 


0.00 


0.00 


0.00 


0.00 


19.14 


0.00 


0.00 


40.62 




52.88 


0.00 


0.00 


0.00 


0.00 


47.12 


0.00 


0.00 






99.96 


0.00 


0.00 


0.00 


0.00 


100.00 


0.00 


0.00 




CS COMPLETE PRI 


0 


2130 


2130 


17 


0 


0 


0 


0 


4277 




0.00 


19.56 


19.56 


0.16 


0.00 


0.00 


0.00 


0.00 


39.28 




0.00 


49.80 


49.80 


0.40 


0.00 


0.00 


0.00 


0.00 






0.00 


99.21 


99.53 


1 00.00 


0.00 


0.00 


0.00 


0.00 




TOTAL 


2340 


2147 


2140 


17 


2062 


2084 


13 


85 


10888 




21.49 


19.72 


19.65 


0.16 


18.94 


19.14 


0.12 


0.78 


100.00 
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Example 2: Missing Values 



As discussed in section 7, full imputation was performed on the NHES:93, NHES:95, and NHES:96 data 
sets. Therefore, all missing values (denoted by the code -1) are legitimate skips. A legitimate skip can 
occur for two reasons. First, answers to one or more previously asked question(s) can result in a skip. 
For example, if a child attends a public school, the question about whether a private school was affiliated 
with a religion would equal -1. This example is illustrated by the SAS run below on NHES:93 School 
Readiness data. 

************************************************************************************ 

SAS RUN 

************************************************************************************ 



Table of PPUBL by MAINRSLT 



PPUBL (R71 - Current school public or private) 
MAINRSLT (Interview completion status) 



Frequency 
Percent 
Row Pet 
Col Pet 


CH 


CK 


CN 


CS 


Total 


-1 


62 


2126 


4423 


0 


6611 


Inapplicable 


0.57 


19.53 


40.62 


0.00 


60.72 




0.94 


32.16 


66.90 


0.00 






100.00 


100.00 


100.00 


0.00 




1 Public 


0 


0 


0 


3750 


3750 




0.00 


0.00 


0.00 


34.44 


34.44 




0.00 


0.00 


0.00 


100.00 






0.00 


0.00 


0.00 


87.68 




2 Private 


0 


0 


0 


527 


527 




0.00 


0.00 


0.00 


4.84 


4.84 




0.00 


0.00 


0.00 


100.00 






0.00 


0.00 


0.00 


12.32 




Total 


62 


2126 


4423 


4277 


10888 




0.57 


19.53 


40.62 


39.28 


100.00 



Of the primary school path (CS) respondents who were administered this item (PPUBL) in the School 
Readiness interview, 527 answered that the child attended a private school and were subsequently asked 
the next item (PCHURCH). All other respondents (n= 10,361) were assigned a value of -1 for 
PCHURCH either because their child attends a public school or because their child is not in primary 
school (CH, CK, and CN). 
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Table of PCHURCH by MAINRSLT 



PCHURCH (R73-Religion-affiliated school) 
MAINRSLT (Interview completion status) 



Frequency 
Percent 
Row Pet 
Col Pet 


CH 


CK 


CN 


CS 


Total 


-1 


62 


2126 


4423 


3750 


10361 


Inapplicable 


0.57 


19.53 


40.62 


34.44 


95.16 




0.60 


20.52 


42.69 


36.19 






100.00 


100.00 


100.00 


87.68 




1 Religion- 


0 


0 


0 


427 


427 


affiliated 


0.00 


0.00 


0.00 


3.92 


3.92 




0.00 


0.00 


0.00 


100.00 






0.00 


0.00 


0.00 


9.98 




2 Not 


0 


0 


0 


100 


100 


religion- 


0.00 


0.00 


0.00 


0.92 


0.92 


affiliated 


0.00 


0.00 


0.00 


100.00 






0.00 


0.00 


0.00 


2.34 




Total 


62 

0.57 


2126 

19.53 


4423 

40.62 


4277 

39.28 


10888 

100.00 
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An entire block of items can be set equal to -1 if the items are not applicable to the interview path. As 
illustrated by the SAS run below, only preschoolers (MAINRSLT = CN) are asked items from the 
School Readiness developmental profile series; the other paths (CH, CK, and CS) are set to -1 for all 
items in the section (n = 6,465). 



Table of DPPENCIL by MAINRSLT 

DPPENCIL (R1 9-CHILD HOLDS PENCIL PROPERLY) 

MAINRSLT (INTERVIEW COMPLETION STATUS) 



Frequency 












Percent 












Row Pet 












Col Pet 


CH 


CK 


CN 


CS 


Total 


-i 


62 


2126 


0 


4277 


6465 


Inapplicable 


0.57 


19.53 


0.00 


39.28 


59.38 




0.96 


32.88 


0.00 


66.16 






100.00 


100.00 


0.00 






1 Yes 


0 


0 


4013 


0 


4013 




0.00 


0.00 


36.86 


0.00 


36.86 




0.00 


0.00 


100.00 


0.00 






0.00 


0.00 


90.73 


0.00 




2 No 


0 


0 


410 


0 


410 




0.00 


0.00 


3.77 


0.00 


3.77 




0.00 


0.00 


100.00 


0.00 






0.00 


0.00 


9.27 


0.00 




Total 


62 


2126 


4423 


4277 


10888 




0.57 


19.53 


40.62 


39.28 


100.00 




53 

54 



Example 3: Comparison of wordings and response categories for similar items in NHES:91, 

NHES:93, NHES:95, and NHES:96 

Section 4.2 discussed some potential pitfalls in data preparation, including "variant variables." This 
example presents frequencies for some items appearing in the NHES:91, NHES:93, NHES:95 and 
NHES:96. Note that item wording and categories are not identical. When comparing similar items from 
multiple survey administrations, the researcher should carefully compare the wording of each item and 
the population of whom the item was asked. 

In the NHES:91, two reading questions were asked. The first asked about general reading frequency, 
and was not tied to a specific time period. The second asked about reading to the child in the week prior 
to the interview. Both items were asked of the full sample. 

************************************************************************************ 

NHES:91 - P19 

************************************************************************************ 



READTO - PI 9 HOW OFTEN READ TO CHILD 

PI 9. About how often do you (and OTHER PARENT/GUARDIAN) read stories to (CHILD)? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


NEVER 


1 


74 


1 .0% 


1 .0% 


SEVERAL TIMES A YEAR 


2 


180 


2.4% 


2.4% 


SEVERAL TIMES A MONTH 


3 


1076 


14.1% 


14.8% 


AT LEAST THREE TIMES A WEEK 


4 


2926 


38.2% 


38.8% 


EVERY DAY 


5 


3390 


44.3% 


43.0% 


RESERVED CODES: 
DK 


-8 


7 


0.1% 


(MISS) 


NOT ASCERTAINED 


-9 


2 


0.0% 


(MISS) 


TOTALS: 




7655 


100.00% 


100.00% 



NOTE: Item PI 9 is from the NHES:91 Preprimary data file. This variable also appears in the NHES:91 Primary data 
file as question E36. 



ERIC 
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**** ** *** ** ******* ** ********************************* ***** afofcafcsfcsfofofeateatofcateateateateate *********** 



NHES:91 - P24 

***************************************************************************** ******* 



WKREAD - P24 READ TO CHILD LAST WK 



P24. In the past week, have you or someone in your family done the following this with (child)? Read 
to (him/her)? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


YES 


1 


7218 


94.3% 


94.2% 


NO 


2 


414 


5.4% 


5.8% 


RESERVED CODES: 
DON'T KNOW 


-8 


1 1 


0.1% 


(MISS) 


NOT ASCERTAINED 


-9 


12 


0.2% 


(MISS) 


TOTALS: 




7655 


100.0% 


100.0% 



WKREADN - P24 # TIMES READ TO CHILD IN PAST WK 

P24. How many times? Would you say one or two times or three or more? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


ONE OR TWO TIMES 


1 


1676 


21.9% 


24.0% 


THREE OR MORE TIMES 


2 


5539 


72.4% 


77.0% 


RESERVED CODES: 
INAPPLICABLE 


-1 


437 


5.7% 


(MISS) 


DON'T KNOW 


-8 


3 


0.0% 


(MISS) 


TOTALS 




10888 


1 00.0% 


100.0% 



NOTE: Item P24 is from the Preprimary file. This item also appears on the Primary file as question E44. 
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In the NHES:93 SR component, two versions of a reading question were asked again. Both items focused 
on the numbers of times the child had been read to in the week prior to the interview. One was a single 
item, and the other was a three-stage item. Each was asked of a split-half sample of respondents. 

************************************************************************************ 

NHES:93 - R96A 

************************************************************************************ 



READTIME = R96A-TIME FAMILY READ TO CHILD LAST WK 



R96A. Now I'd like to talk with you about activities in your home in the past week. How many times 
have you or someone in your family read to (child) in the past week? Would you say. . . not at all, 
once or twice, three or more times, or every day? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


1 NOT AT ALL 


1 


431 


4.0% 


7.7% 


2 ONCE OR TWICE 


2 


1048 


9.6% 


19.6% 


3 3 OR MORE TIMES 


3 


15559 


14.3% 


29.0% 


4 EVERY DAY 


4 


2359 


21.7% 


43.7% 


RESERVED CODES 
-1 INAPPLICABLE 


-1 


5491 


50.4% 


(MISS) 


TOTALS: 




10888 


100.0% 


100.0% 



NHES:93 - R96, R97, R98 

************************************************************************************ 



READTO - R96-FAMILY MEMBER READ TO CHILD LAST WK 



R96. Now I'd like to talk with you about activities in your home in the past week. In the past week, 
have you or has someone in your family read to (child)? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


1 YES 


1 


4926 


45.2% 


90.3% 


2 NO 


2 


565 


5.2% 


9.7% 


RESERVED CODES: 
-1 INAPPLICABLE 


-1 


5397 


49.6% 


(MISS) 


TOTALS: 




10888 


100.0% 


100.0% 
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READTON - R97-TIMES/WK FAMILY READ TO CHILD 

R97. How many times? Would you say . . .one or two times or three or more times? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


1 ONE OR TWO TIMES 


1 


1 118 


10.3% 


22.4% 


2 THREE OR MORE TIMES 


2 


3808 


35.0 


77.6% 


RESERVED CODES: 
-1 INAPPLICABLE 


-1 


5962 


54.8% 


(MISS) 


TOTALS 




10888 


100.0% 


100.0% 



READDAY - R98-READING EVERY DAY IN LAST WEEK 








R98. Was that every day in the past week? 
RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


1 YES 


1 


2563 


23.5% 


6.6% 


2 NO 


2 


1245 


1 1 .4% 


32.4% 


RESERVED CODES: 
-1 INAPPLICABLE 


-1 


7080 


65.0% 


(MISS) 


TOTALS: 




10888 


100.0% 


100.0% 



In the NHES:95 ECPP component and the NHES:96 Parent PFI/CI component, there was a single item 
asking about the frequency with which family members read to the child in the past week. 

********* *********** ********* **** ****** ***************** ***** ******* * * * * ********* * 

NHES:95 - LI 

**************************************** :K ***, K *, K **** lK##lKJK# , K######+#############j|cj|cj|cj|cj|cj|c 



HAREADFM = LI -TIMES FAMILY READ TO CHILD LAST WK 



LI . How many times have you or someone in your family read to (child) in the past week? Would you 
say. . . not at all, once or twice, three or more times, or every day? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


1 NOT AT ALL 


1 


1360 


9.7% 


9.7% 


2 ONCE OR TWICE 


2 


2458 


17.5% 


17.6% 


3 3 OR MORE TIMES 


3 


3496 


24.9% 


24.4% 


4 EVERY DAY 


4 


6750 


48.0% 


48.3% 


TOTALS: 




14064 


100.0% 


100.0% 




58 57 



************************************************************************************ 

NHES:96 - PI1 

************************************************************************************ 
FOREADTO = P1 1 -TIMES READ TO CHLD PAST WK 

P1 1 . How many times have you or someone in your family read to (CHILD) in the past week? Would 
you say. . . not at all, once or twice, three or more times, or every day? 



RESPONSE 


CODES 


FREQ 


UNWGTD 

PERCENT 


WGTD 

PERCENT 


1 NOT AT ALL 


1 


622 


3.0% 


8.0% 


2 ONCE OR TWICE 


2 


1725 


8.3% 


21.6% 


3 3 OR MORE TIMES 


3 


2358 


1 1 .3% 


27.9% 


4 EVERY DAY 


4 


3707 


17.8% 


42.5% 


RESERVED CODES: 
-1 INAPPLICABLE 


-1 


12380 


59.5% 


(MISS) 


TOTALS: 




20792 


100.0% 


100.0% 



An analyst wishing to examine trends in reading over time should use items that focus on reading 
activities in the past week. The general reading question in the NHES:91 is not tied to a specific time 
frame and is therefore not comparable. The greatest comparability over time is found in items P24 in 
the NHES:91 (WKREAD, WKREADN), R96/R97 in the NHES:93 (READTO, READTON), LI in the 
NHES:95 (HAREADFM), and PI1 in the NHES:96 (FOREADTO). 
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APPENDIX C 



NHES:91/93/95/96 Summary of Weighting and 
Sample Variance Estimation Variables 
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